What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
Technician is responsible for repair and maintenance of a variety of agricultural, lawn maintenance and construction equipment. This technician may be required to go out into the field and service equipment. Candidate will be required to demonstrate a variety of skills ...
...We are seeking a Java Developer to support the development and maintenance of Java-based applications. The role involves working with senior developers to implement features, fix issues, and ensure code quality. Responsibilities: Assist in developing and maintaining...
...force service contracts and company guidelines. Review and approve all vocational rehabilitation plans. Establish, monitor, and adjust monetary case reserves when warranted and in strict accordance with assigned authority levels. Review all medical bills for...
Looking for an experienced Tile installer with an exceptional work ethic to join our team for a full-time position: Continuous work throughout the year. Must have experience in with large format and standard tile, different types of grouts including epoxy, Schluter ...
...Now Hiring Spring 2026 Remote ELL Tutors at Littera Education! Help students succeed. Earn extra income. Work from home. Are you passionate about supporting student growth? Ready for flexible, rewarding, remote work? Littera Education is hiring part-time online tutors...