
Ready4S
Tech Lead SRE / Principal SRE Engineer needed to design and operate scalable systems on AWS. Requires 9+ years in SRE, strong Kubernetes and AWS skills, remote work available.
We are looking for a Tech Lead SRE / Principal SRE Engineer to join a team working directly on a proprietary, business-critical product in a fast-paced and dynamic environment. This is a hands-on role where you will have a real impact on system reliability, scalability, and technical direction.The roleAs a Technical SRE Lead / Principal Site Reliability Engineer, you will design, implement, and operate highly available and scalable systems built primarily on Kubernetes (AWS EKS). You will play a key role in setting technical standards, guiding engineers, and ensuring operational excellence across production environments.You will work extensively with Terraform, ArgoCD, and GitHub Actions, applying GitOps principles and modern deployment strategies such as blue-green, canary releases, and feature flagging. The role requires strong troubleshooting skills, a deep understanding of distributed systems, and active participation in production support when needed.Main responsibilitiesDesign, operate, and troubleshoot Kubernetes clusters (AWS EKS) with a focus on networking, scalability, security, and reliabilityArchitect and maintain highly available, fault-tolerant infrastructure on AWS using Infrastructure as Code (Terraform)Automate provisioning, deployment, and configuration processes following GitOps practices with ArgoCD and GitHub ActionsDefine and enforce guardrails for infrastructure, applications, and databases to ensure secure and consistent operationsImplement and maintain monitoring and observability solutions using Prometheus, Grafana, and related toolsBuild and evolve CI/CD pipelines and progressive delivery strategiesCollaborate closely with development teams to embed reliability and security best practices throughout the application lifecycleParticipate in incident response, post-incident reviews, and continuous improvement initiatives, including resilience testing and chaos engineeringDesign and manage secure networking solutions, including AWS VPCs, Kubernetes networking, and firewallsWhat we are looking forRequired qualifications9+ years of commercial experience in SRE, systems engineering, infrastructure, or related rolesAt least 2 years of experience in a Tech Lead or similar leadership positionUniversity degree in Computer Science or a related fieldStrong hands-on experience with Kubernetes (AWS EKS or similar), including networking, scaling, and securityAdvanced knowledge of AWS services such as EKS, EC2, CloudWatch, Route53, Aurora, and S3Proven experience with Terraform, ArgoCD, and GitHub ActionsSolid background in monitoring, observability, and incident management (Prometheus, Grafana)Strong scripting and automation skills in Python, Go, or BashAvailability to work standard hours 09:00–17:00 CETWillingness to actively participate in production support activities when requiredNice to haveExperience with other cloud platforms such as GCP or AzureFamiliarity with logging and observability stacks like ELK, Loki, or GraylogExperience with chaos engineering and resilience testingKnowledge of secrets management tools such as HashiCorp Vault or SOPSExperience working with databases, including setup, scaling, and optimisationStrong communication, mentoring, and coaching skills
Zaloguj się, aby zobaczyć pełny opis oferty
| Opublikowana | 2 dni temu |
| Wygasa | za 2 miesiące |
| Rodzaj umowy | B2B |
| Źródło |
Nie znaleziono ofert, spróbuj zmienić kryteria wyszukiwania.