Staff Site Reliability Engineer - Big Data Hadoop (PRE)

Hexjobs ATS

Staff Site Reliability Engineer - Big Data Hadoop (PRE)

VISA

Warszawa +1 więcej

Hybrydowa

Hadoop

Hybrydowa

🔄 DevOps

hBase

📊 Big Data SRE

Ansible

Spark

Shell Scripting

Kafka

🐍 Python

Requirements

Expected technologies

Hadoop

Optional technologies

Shell

Ansible

Python

Our requirements

As a Staff Site Reliability Engineer, you will play a key role in maintaining and supporting Visa’s Data Platform, ensuring the reliability and performance of critical Big Data systems.
You will drive innovation for our partners and clients globally by working on open-source Big Data clusters, optimizing their availability, efficiency, and scalability.
Master’s degree in Math, Science, Engineering, Computer Science, Information Systems, or a related field; OR
Bachelor’s degree in Math, Science, Engineering, Computer Science, Information Systems, or a related field, AND a minimum of five years of relevant experience; OR
A minimum of five years of experience working with Hadoop systems.

Optional

Experience in Big Data SRE and Engineering across open-source platforms such as Hadoop, Kafka, HBase, and Spark, with strong troubleshooting and debugging skills.
Proven ability to conduct effective root cause analysis of major production incidents, document findings, and implement high-availability solutions for critical services.
Expertise in capacity planning, system expansions, and timely upgrades to mitigate scaling challenges, while automating repetitive tasks to reduce manual effort and prevent errors.
Ability to fine-tune alerting and set up observability tools to proactively identify and resolve performance issues, collaborating with Level-3 teams on use case reviews and cluster hardening.
Strong documentation skills to create standard operating procedures and platform utilization guidelines, ensuring consistency and efficiency in operations.
Proficiency in leveraging DevOps tools and industry best practices, including incident, problem, and change management disciplines.
Commitment to ensuring Hadoop platform performance meets service-level agreements, with experience in security remediation, automation, and self-healing implementations.
Experience in developing automation tools and reports to streamline processes, using technologies such as Shell scripting, Ansible, Python, or other programming languages.

Your responsibilities

Hadoop/Big-Data:

Sound knowledge on managing large scale Hadoop platforms including monitoring the platform, debugging issues, and tuning the performance of the cluster. In-depth knowledge of the Hadoop ecosystem, including Zookeeper, HDFS, Yarn, HIVE, SPARK, Trino and Kafka. Proven experience in debugging issues on both Hadoop platform and applications. Familiarity with security tools such as Kerberos, Ranger, and active directory integrations. Experience on Cloud technologies preferably AWS EMR. Knowledge on Kubernetes, AI, MLOPS will be advantageous.

Collaboration and Teamwork:

Collaborate closely with L-3 teams to review new use cases and implement cluster hardening techniques, ensuring the development of robust and reliable platforms. Foster cross-team collaboration, building and maintaining strong relationships with customer teams, user communities, architects, and engineering teams. Work jointly on key deliverables to ensure production scalability and stability.

Wyświetlenia: 4

Zgłoś

Opublikowana	dzień temu
Wygasa	za 13 dni
Tryb pracy	Hybrydowa
Źródło

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Staff Site Reliability Engineer - Big Data Hadoop (PRE)"

Dlaczego nikt nie odpowiada na Twoje CV?

Milczenie jest przytłaczające. Wysyłasz aplikacje jedna po drugiej, ale Twoja skrzynka odbiorcza pozostaje pusta. Nasze AI ujawnia ukryte bariery, które utrudniają Ci dotarcie do rekruterów.

Nie znaleziono ofert, spróbuj zmienić kryteria wyszukiwania.