What You"ll Do
Design and implement solutions that enhance application reliability, performance, scalability, and resilience.
Build and maintain monitoring, alerting, observability, and telemetry to drive proactive detection and incident analysis.
Lead incident management efforts, perform root cause analysis, and implement action based improvements.
Implement operational workflows using scripting, IaaC, and configuration management tools.
Manage capacity, performance, and scaling solutions to forecast demand and optimize infrastructure.
Collaborate with engineering teams to embed operability, resilience, and security into application architectures.
Build and automate reliable deployments through CI/CD pipelines, release governance, and version control systems.
Maintain clear runbooks, architecture diagrams, and operational documentation that enable efficient production support.
Experience Required
Managing Kubernetes and containerized workloads (EKS, AKS, GKE), including scaling, networking, upgrades, and monitoring.
Experience with cloud platforms (AWS, Azure, or Google Cloud Platform) across compute, storage, networking, IAM, and cost governance.
Using observability and APM tools such as Dynatrace, Splunk, Prometheus, Grafana, Datadog, Elastic/ELK.
Strengthening security and compliance controls in regulated environments (e.g., PCI DSS, SOC 2), including secure management of workloads.
Infrastructure automation experience using Terraform, CloudFormation, Ansible, or similar tools.
Designing and maintaining CI/CD pipelines using Jenkins, GitLab CI, GitHub Actions, or Azure DevOps.
Scripting and automation using Bash, PowerShell, or Python.
Experience in environments of electricity, engineering, or military related background (preferred).
Good to Have
Certifications such as AWS SysAdmin, AWS DevOps Engineer, Google Cloud DevOps Engineer, or CKA.
Experience with legacy applications, IBM iSeries, and/or library systems.
Hands on database operations and performance tuning (Oracle, SQL Server, PostgreSQL).
Prior experience as a major incident commander, stakeholder communicator, or ops lead/coordinator.
Experience with ITIL and ServiceNow (change, incident, and configuration management).
...Assistance Program ~ Company wide discounts on propane, fuel, convenience store merchandise, hotel stays, and more!' Job Type: Part-time Pay: From $12.50 per hour Benefits: ~401(k)~401(k) matching ~ Employee assistance program ~ Employee discount...
...1 MSCS Cordova, TN Apply Job Type Full-time Description The YMCA of Memphis & the Mid-South is a cornerstone non-profit organization dedicated to strengthening communities through youth development, healthy living and social responsibility. The YMCA...
...education, insurance, loans, real estate and travel. Job Description Forbes Advisor is looking for a staff writer with writing and reporting experience on tax... ...Competitive compensation package Ability to work remotely Unlimited PTO Every third Friday of the...
Who We Are: Since 1985, BHCHPs mission has been to ensure unconditionally equitable and dignified access to the highest quality health care for all individuals and families experiencing homelessness in greater Boston. Over 10,000 homeless individuals are cared for ...
...Hair Stylist/Barber at Sport Clips summary: Sport Clips is hiring licensed hair stylists/barbers for its Cedar Park salon, offering hourly earnings of $21$32 (including tips and incentives), training, career advancement, benefits, and a team-oriented culture. Candidates...