Senior Site Reliability Engineer

EPAM Systems (Poland) sp. z o.o.

Kraków, Grzegórzki +1 mehr
Hybrid
🐍 Python
JavaScript
Java
ServiceNow
Splunk
Git
GitLab
☁️ Microsoft Azure
☁️ Azure Kubernetes Service
Hybrid
🚢 Kubernetes
🐳 Docker

Requirements

Expected technologies

Python

JavaScript

Java

ServiceNow

Splunk

Git

GitLab

Microsoft Azure

Azure Kubernetes Service

Operating system

Linux

Our requirements

  • Minimum of 3 years programming experience in Python, JavaScript, or Java
  • At least 3 years of experience in DevOps including building and troubleshooting pipelines
  • Proficiency in automation using Python or other scripting languages
  • Knowledge of Unix administration
  • Familiarity with ITIL processes
  • Experience using ServiceNow for operational support
  • Experience with Azure Log Analytics and query languages such as KQL or Splunk
  • Hands-on experience implementing monitoring solutions
  • Experience with Git version control systems, preferably GitLab
  • Previous experience in L2/L3 application or infrastructure support
  • Working knowledge of containers and Azure Kubernetes Service
  • Strong experience with Microsoft Azure platform
  • Demonstrated ability to deploy enterprise applications using infrastructure as code

Your responsibilities

  • Drive reliability improvements in production systems to reduce escalations and enable faster feature development
  • Handle support escalations with a thorough understanding of environment, code, and logs
  • Manage incident response, change management, and business continuity activities
  • Analyze and document system issues from business and technical perspectives
  • Identify and implement solutions and system improvements, including automation of manual tasks
  • Collaborate with product managers, developers, quality analysts, and support teams to support project delivery and onboarding
  • Provide regular updates to management on system status and issues
  • Develop technical fixes and scripts to support operational needs
  • Investigate problems to determine root cause and provide workarounds
  • Create and maintain known error documentation
  • Own the lifecycle of problem resolution
  • Perform daily system monitoring and troubleshoot production issues
  • Support and configure global production environments
  • Manage release processes for UAT and production environments
  • Document support procedures, releases, and troubleshooting guides
  • Provide coverage during weekdays and weekends as needed
Aufrufe: 1
Veröffentlichtvor 1 Tag
Läuft abin 13 Tagen
ArbeitsmodusHybrid
Quelle
Logo
Logo

Ähnliche Jobs, die für Sie von Interesse sein könnten

Basierend auf "Senior Site Reliability Engineer"