What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...Partners. Were on a mission to transform how companies use data. Come be a part of our journey! Were looking for a Revenue Accounting Manager to own and evolve our global revenue accounting function as ClickHouse continues to scale. This role is pivotal in ensuring...
...Digital Marketing Manager The Digital Marketing Manager will be expected to lead a team that effectively crafts and implements digital marketing initiatives including search marketing, social media, email marketing and lead management for clients in a variety of industries...
...year, new growth, and a chance to build a long-term career with a company that still does things the right way. American Roofing & Waterproofing is a proudly family-owned and operated company hiring experienced roofers who want consistent work, strong benefits, and real...
...We are conducting a confidential search for an experienced Pharmacy Technician to support a retail pharmacy setting in Dallas. In this role, you will assist in the preparation, distribution, and management of medications under the supervision of a licensed pharmacist...
...Join our team as a Luxury & Leisure Travel Planner and help clients enjoy high-end, personalized travel experiences. Whether it's luxury resorts, fine dining cruises, or VIP getaways, your role will be to support clients with research, itinerary planning, and reservations...