At Sauce Labs, we empower the world's top enterprises - like Walmart, Bank of America, and Indeed - to deliver quality web and mobile applications at speed. Our industry-leading platform ensures continuous quality across the SDLC, using AI-powered analytics to identify key quality signals from development through production. With our unified solution, teams can release and innovate with confidence, knowing their apps will always look, function, and perform exactly as they should. Backed by TPG and Riverwood Capital, we are shaping the future of digital confidence - join us!
The Role:
We are seeking an experienced and highly skilled SRE Engineering Manager to lead our Site Reliability Engineering team in EMEA. The ideal candidate will possess a strong technical background in SRE, observability, and infrastructure, coupled with proven leadership abilities to guide and mentor a team of engineers, drive strategic initiatives, and foster a culture of operational excellence.
Responsibilities:
Team Leadership & Management:
Lead, mentor, and grow a team of Site Reliability Engineers, fostering a collaborative and high-performing environment.
Conduct regular one-on-one meetings, provide constructive feedback, and support career development for team members.
Manage team priorities, allocate resources effectively, and ensure timely delivery of projects and initiatives.
Participate in the recruitment, interviewing, and hiring of new SRE talent.
Strategic Planning & Execution:
Drive the adoption and implementation of SRE best practices, including error budgets, post-mortems, and incident response.
Lead the definition, implementation, and tracking of Service Level Objectives (SLOs) across critical services.
Develop and execute strategic roadmaps for improving system reliability, scalability, and performance.
Champion cross-team initiatives and implementations by leveraging influence and building strong relationships with other managers, technical leads, and stakeholders.
Technical Oversight & Hands-on Contribution:
Provide hands-on technical guidance and expertise in SRE, Observability, and/or Infrastructure (On-Premises or Cloud) domains.
Oversee the design, implementation, and maintenance of robust observability systems (e.g., Grafana, Loki, Tempo, Prometheus, Thanos, Mimir, Jaeger).
Guide the team in managing and developing for Kubernetes-based workloads, ensuring stability and efficiency.
Ensure the effective deployment and management of cloud-based services in major cloud providers like GCP or AWS, including managed services such as GKE, EKS, GCE, EC2, S3, GCS, Lambda, Cloud Run, etc.
Contribute to technical discussions, provide architectural guidance, and ensure sound engineering decisions.
Demonstrate basic programming knowledge in Python or Go to support automation and tooling development.
Ability to manage hybrid, GCP, and AWS resources using Infrastructure as Code.
Documentation & Communication:
Lead the creation and review of technical design documents, proposals, and architectural diagrams.
Communicate complex technical concepts and project updates clearly and concisely to both technical and non-technical stakeholders.
Ensure thorough documentation of systems, processes, and incident responses.
Required Skills:
Bachelor's of Science degree in a relevant field, or equivalent practical experience.
1 - 2 years of Leadership experience in highly technical teams.
5 - 8 years of hands-on technical expertise in an SRE, Observability, and/or Infrastructure (On-Premises or Cloud) role.
Basic programming knowledge in Python or Go.
Experience implementing Service Level Objectives (SLOs).
Experience driving cross-team initiatives and implementations by leveraging influence and relationships with other managers or technical leads.
Experience writing technical design documents, proposals, and diagrams.
Experience using observability systems such as Grafana, Loki, Tempo, Prometheus, Thanos, Mimir, Jaeger, or similar.
Experience managing or developing for Kubernetes-based workloads.
Experience deploying cloud-based services in a major cloud provider, such as GCP or AWS, including managed services such as GKE, EKS, GCE, EC2, S3, GCS, Lambda, Cloud Run, etc.
Experience leading and working with remote-first teams distributed across timezone gaps.
Experience in IaC - terraform, cloud formation, ansible, puppet, salt, che, or similar.
Nice-to-Have Qualifications:
Prior experience as a technical lead in an SRE, Observability, and/or Infrastructure capacity.
Hands-on experience implementing and supporting OpenTelemetry collectors and related infrastructure.
Experience designing cloud-native solutions and systems.
We are a hybrid workplace that recognizes the importance of flexibility while valuing in-person collaboration and relationship building. As a result, Saucers located near an office location must be able and willing to come into the office. Those hired remotely must be able and willing to travel to an office as required by the specific role.
Security responsibilities at Sauce:
At Sauce, we will commit to supporting the health and safety of employees and properties, partnering with internal stakeholders to learn and act on ever-evolving security protocols and procedures. You’ll be expected to fully comply with all policies and procedures related to security at the department and org wide level and exercise a ‘security first’ approach to how we design, build & run our products and services.
Aufrufe: 3
Bericht
Veröffentlicht
vor 2 Tagen
Läuft ab
in 28 Tagen
Art des Vertrags
Employment
Arbeitsmodus
Praca Zdalna
Quelle
Ähnliche Jobs, die für Sie von Interesse sein könnten