Staff Site Reliability Engineer - Big Data Hadoop (PRE)

VISA

Warszawa +1 więcej
Hybrydowa
Hadoop
Hybrydowa
🔄 DevOps
hBase
📊 Big Data SRE
Ansible
Spark
Shell Scripting
Kafka
🐍 Python

Requirements

Expected technologies

Hadoop

Optional technologies

Shell

Ansible

Python

Our requirements

  • As a Staff Site Reliability Engineer, you will play a key role in maintaining and supporting Visa’s Data Platform, ensuring the reliability and performance of critical Big Data systems.
  • You will drive innovation for our partners and clients globally by working on open-source Big Data clusters, optimizing their availability, efficiency, and scalability.
  • Master’s degree in Math, Science, Engineering, Computer Science, Information Systems, or a related field; OR
  • Bachelor’s degree in Math, Science, Engineering, Computer Science, Information Systems, or a related field, AND a minimum of five years of relevant experience; OR
  • A minimum of five years of experience working with Hadoop systems.

Optional

  • Experience in Big Data SRE and Engineering across open-source platforms such as Hadoop, Kafka, HBase, and Spark, with strong troubleshooting and debugging skills.
  • Proven ability to conduct effective root cause analysis of major production incidents, document findings, and implement high-availability solutions for critical services.
  • Expertise in capacity planning, system expansions, and timely upgrades to mitigate scaling challenges, while automating repetitive tasks to reduce manual effort and prevent errors.
  • Ability to fine-tune alerting and set up observability tools to proactively identify and resolve performance issues, collaborating with Level-3 teams on use case reviews and cluster hardening.
  • Strong documentation skills to create standard operating procedures and platform utilization guidelines, ensuring consistency and efficiency in operations.
  • Proficiency in leveraging DevOps tools and industry best practices, including incident, problem, and change management disciplines.
  • Commitment to ensuring Hadoop platform performance meets service-level agreements, with experience in security remediation, automation, and self-healing implementations.
  • Experience in developing automation tools and reports to streamline processes, using technologies such as Shell scripting, Ansible, Python, or other programming languages.

Your responsibilities

Hadoop/Big-Data:

Sound knowledge on managing large scale Hadoop platforms including monitoring the platform, debugging issues, and tuning the performance of the cluster. In-depth knowledge of the Hadoop ecosystem, including Zookeeper, HDFS, Yarn, HIVE, SPARK, Trino and Kafka. Proven experience in debugging issues on both Hadoop platform and applications. Familiarity with security tools such as Kerberos, Ranger, and active directory integrations. Experience on Cloud technologies preferably AWS EMR. Knowledge on Kubernetes, AI, MLOPS will be advantageous.

Collaboration and Teamwork:

Collaborate closely with L-3 teams to review new use cases and implement cluster hardening techniques, ensuring the development of robust and reliable platforms. Foster cross-team collaboration, building and maintaining strong relationships with customer teams, user communities, architects, and engineering teams. Work jointly on key deliverables to ensure production scalability and stability.

Wyświetlenia: 4
Opublikowanadzień temu
Wygasaza 13 dni
Tryb pracyHybrydowa
Źródło
Logo
Logo
Logo

Podobne oferty, które mogą Cię zainteresować

Na podstawie "Staff Site Reliability Engineer - Big Data Hadoop (PRE)"