Solution Architect – Site Reliability (SRE) & Observability | f/m/d

Hexjobs ATS

Solution Architect – Site Reliability (SRE) & Observability | f/m/d

ERGO Technology & Services S.A.

Warsaw

placeholder

Solution Architect

Site Reliability Engineering

Observability

IaC

Terraform

☁️ Azure

🚢 Kubernetes

📊 Databricks

English fluency

German (nice to have)

Podsumowanie

Solution Architect – SRE & Observability (pełny etat) w Warszawie/Gdańsku. Odpowiada za strategiczną wizję, projektowanie i zarządzanie obserwowalnością, IaC, monitoring, mentoring zespołów SRE oraz negocjacje z dostawcami. Wymaga silnego doświadczenia w SRE, narzędziach obserwacyjnych, Terraform, Azure/K8s, płynnej znajomości języka angielskiego; niemiecki mile widziany.

Słowa kluczowe

Solution ArchitectSite Reliability EngineeringObservabilityIaCTerraformAzureKubernetesDatabricksEnglish fluencyGerman (nice to have)

Benefity

•pakiet medyczny
•karta sportowa i sekcje sportowe
•elastyczne godziny pracy
•program wsparcia pracownika (confidential employee assistant)
•możliwość pracy zdalnej
•pokój gier oraz przyjazne psom biuro w Warszawie
•warsztaty i szkolenia, hackathony, meetupy
•platformy e‑learningowe i kursy językowe
•działania CSR
•wyścigi rowerowe, mecze piłkarskie, maratony filmowe w kinie firmowym
•zróżnicowane i inkluzywne środowisko pracy

Opis stanowiska

What you will do

As a Solution Architect, you will be responsible for defining the strategic direction of the Site Reliability Engineering (SRE) service including observability and monitoring. This role focuses on architectural decisions, designing integrations, ensuring best practices, and advising SRE engineers and consulting customer teams on how to automate their service operations and leverage observability tools (e.g. Datadog) effectively.

How you will get the job done

defining the strategic vision for site reliability engineering, observability and platform engineering and planning tactical steps for implementation
leading the design and governance of automated service operations, observability tooling, ensuring scalability, security, and cost efficiency
scouting and analysing new observability features – matching them to business needs and notifying the engineers about potential improvements
designing collaboration, automation and integration models
defining standards/best practices for automated service operations, observability framework including alerting, SLOs, and distributed tracing across digital products
configuring, integrating, administering, and maintaining observability for all relevant digital products, using Infrastructure as Code (IaC)
ensuring comprehensive monitoring coverage across digital products
supporting, advising, and coaching SRE engineers on the best ways to automate service operations, and the use observability tools
supporting SRE engineers in troubleshooting and optimizing monitoring configurations
guiding and mentoring engineers in implementing provisioning and configuration of observability tools using Infrastructure as Code
engaging with the observability tool vendors to discuss complex technical issues and feature enhancements
answering technical questions from product teams
negotiating technical aspects of observability tools during procurement discussions to ensure optimal setup

What we offer

Let's be healthy – medical package, sports card, and numerous sports sections – these are some of the benefits that help our employees stay in good shape.

Let's be balanced – work-life balance is a key aspect of a healthy workplace. We offer our employees flexible working hours, a confidential employee assistant program, as well as the possibility of remote working. However, staying at home with our in-office gaming room and dog-friendly office in Warsaw won’t be easy.

Let's be smart – we organize numerous workshops and training courses. Thanks to hackathons and meetups, our specialists share their expertise with others. Additionally, we have a wide range of digital learning platforms and language courses.

Let's be responsible – each year, we participate in several CSR activities, during which, together with our colleagues, we do our best to create a better future.

Let's be fun – company-wide bike races and soccer matches, film marathons in our cinema room or other engaging team-building activities – we got it covered!

Let's be diverse – every team member is valued, regardless of gender, nationality, religious beliefs, disability, age, and sexual orientation or identity. Your qualifications, experience, and mindset are our greatest benefit!

Requirements

fluency in English
strong Site Reliability Engineering (SRE), Platform Engineering and Observability Architecture experience
expertise in observability tools (architecture, governance, integrations, APM, security best practices) and automating service operations
strong Infrastructure as Code (IaC) knowledge and experience (e.g. Terraform)
experience designing log management, APM, infrastructure monitoring, and synthetic testing solutions
knowledge of distributed tracing, metrics, and telemetry collection
familiarity with cloud environments (Azure, Kubernetes, Databricks)
strong strategic thinking and vision-setting for observability and reliability
excellent stakeholder communication and coaching abilities
experience negotiating with vendors and external service providers
ability to lead and mentor engineers, ensuring effective implementation of observability tooling