2+ years of experience with Linux, if you are poweruser - it's better
Solid knowledge of Linux (bash, threads, IPC, filesystems; being power-user is strongly desired, understanding how OS works so you can benefit from performance optimizations in production but also in daily workflows)
1+ years of experience with Spark, primarily using Python (PySpark) or Scala for Big Data processing (that includes understanding of how Spark works and why)
Very solid understanding of Spark basics, building blocks and mechanics
Very solid understanding of Python (especially PySpark)
Huge need to drive projects of the future, improve stuff, risk taking mindset - covered by examples
Great communication skills (you can drive end-to-end projects, and guide dev team members)
Professional working proficiency in English (both oral and written)
Understanding of HTTP API communication patterns (HTTP/REST/RPC) and protocol
Good software debugging skills, not only prints, but also using debugger
Deep understanding of at least one technical area (please let us know which one is this and prepare a story of the biggest battle story about this you had)
Quite good understanding of Git
If you don't have all the qualifications, but you're interested in what we do and you have a solid Linux understanding -> let's talk!
Offer description
Correct Context is looking for a PySpark Big Data Developer for Comscore in Poland and around.
Comscore is a global leader in media analytics, revolutionizing insights into consumer behavior, media consumption, and digital engagement.
Comscore leads in measuring and analyzing audiences across diverse digital platforms. Thrive on using cutting-edge technology, play a vital role as a trusted partner delivering accurate data to global businesses, and collaborate with industry leaders like Facebook, Disney, and Amazon. Contribute to empowering businesses in the digital era across media, advertising, e-commerce, and technology sectors.
We offer:
Real big data projects (PB scale) 🚀
An international team (US, PL, IE, CL) 🌎
A small, independent team working environment 🧑💻
High influence on working environment
Hands on environment
Flexible work time ⏰
Fully remote or in-office work in Wroclaw, Poland 🏢
12,000 - 18,000 PLN net/month B2B 💰
Private healthcare (PL) 🏥
Multikafeteria (PL) 🍽️
Free parking (PL)🚗
The recruitment process for thePySpark Big Data Developerposition has following steps:
Technical survey - 10min
Technical screening - 30 min video call
Technical interview - 60min video call
Final Interview - Technical/Managerial - 30 min video call
Your responsibilities
Design, implement, and maintain petabyte-scale Big data pipelines using Python, PySpark, Apache Airflow, Kubernetes and a lot of other tech
Optimize – working with Big data is very specific, sometimes it’s IO/CPU-bound, depending on the process, we need to figure out a faster way of doing things. At least empirical knowledge of calculation complexity, as in Big data, even simple operations, when you multiply by the size of the dataset can be costly
Conduct Proof of Concept (PoC) for enhancements
Writing great and performant Big Data Python code
Cooperate with other Big data teams
Work with technologies like AWS, Kubernetes, Airflow, EMR, Hadoop, Linux/Ubuntu, Kafka, and Spark