What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...with the sole aim of always delivering a world-class service. WFS provides a full-service capability that includes Aviation cargo, Airline passenger, and Aviation ramp services. Our team of people is highly trained, highly skilled, and confident airport service professionals...
...Highly respected consulting engineering firm seeks CAD Designer: ~4+ years of stable employment history ~ REVIT, AutoCAD and BIM skills ~ Diverse Structural and Foundation project design experience ~ Ability to interact with engineering department ~ Ability...
...Job Summary: A Class A CDL Hot Shot Driver in the oilfield industry plays a crucial role in transporting equipment, materials, and personnel to various locations, often on short notice. Operating specialized trucks equipped with trailers, these drivers ensure timely...
...compliance, and turn contracts into critical carriers of operational business intelligence. Its the only platform flexible enough to handle... ...the Role: We are seeking an experienced Business Technology Partner G&A Leader to join our Business Technology team, focused on...
...complexity. This role will play a critical part in elevating the Finance function from a tactical function to a strategic driver of the... ...costs with quality and utilization outcomes. Oversee all accounting, audit, and tax functions, ensuring GAAP compliance, timely...