Big Data Engineer
RemoteKyiv, Kyiv, UkraineTechnical
MGID is a global advertising platform helping brands reach unique local audiences at scale. In MGID we empower brands and publishers to work together transparently through our privacy-first targeting technology to enable advertisers to drive performance and awareness, and publishers to retain and monetize their audiences. Today, we’re creating unique technologies and with your help, we are looking to aim even higher.
— A proprietary Highload service that delivers 185 billion advertisements to 850 million unique users in more than 70 languages;
— The winner of multiple AdTech awards for innovation and product quality;
— A workforce of 600+ employees operating from offices in the US, Europe, and Asia;
— A passion for cutting-edge technology and a seamless vertical structure that allows the regional teams to exchange skills and development practices.
We are looking for a Big Data Engineer to efficiently work with large datasets and support our Data Science team in developing, improving, and delivering our ML and AI solutions and algorithms.
If you're passionate about building scalable data solutions and staying up-to-date with the latest industry trends and technologies, we want to hear from you!
- Proven experience in developing and optimizing PySpark applications.
- Strong knowledge of distributed computing principles and concepts.
- Practical experience working with large datasets using technologies such as Hadoop, Spark, ClickHouse.
- Proficiency in programming languages such as Python, SQL.
- Experience with Linux/Unix command-line interface.
- Familiarity with data visualization and dashboarding tools.
- Strong communication skills and ability to work effectively in a remote team environment.
- Excellent problem-solving skills and attention to detail.
Will be a plus:
- Bachelor's or Master's degree in Computer Science or a related field.
- Practical experience with ClickHouse.
- Practical experience with stream processing and messaging systems such as Kafka.
- Practical experience with NoSQL databases (for example MongoDB), especially Aerospike.
- Knowledge of AdTech domain - understanding of online advertising, RTB.
- Familiarity with containerization technologies such as Docker and Kubernetes, cloud computing platforms.
- Familiarity with data governance and security best practices.
- Knowledge of machine learning concepts and frameworks.
- Collaborate with Data Scientists, Data Analysts, and other stakeholders to understand data needs and develop solutions.
- Design, develop, and optimize PySpark applications for processing and analyzing large sets of structured and unstructured data.
- Monitor and evaluate data to ensure accuracy and integrity, troubleshoot and debug PySpark code.
- Build and maintain data pipelines for ingesting, processing, and storing data, optimizing for performance and scalability.
- Develop and maintain data visualization dashboards and reports to enable insights and decision-making.
- Create and maintain tools and libraries for efficient data processing.
- Stay up-to-date with industry trends and new technologies to continuously improve data processing capabilities.