Were looking for a Data Engineer to join our team and help shape a modern, scalable data platform. Youll work with cutting-edge AWS technologies, Spark, and Iceberg to build pipelines that keep our data reliable, discoverable, and ready for analytics.
Whats the Job?
Design and maintain scalable data pipelines on AWS (EMR, S3, Glue, Iceberg).
Transform raw, semi-structured data into analytics-ready datasets using Spark.
Automate schema management, validation, and quality checks.
Optimize performance and costs with smart partitioning, tuning, and monitoring.
Research and evaluate new technologies, proposing solutions that improve scalability and efficiency.
Plan and execute complex data projects with foresight and attention to long-term maintainability.
Collaborate with engineers, analysts, and stakeholders to deliver trusted data for reporting and dashboards.
Contribute to CI/CD practices, testing, and automation.
Whats the Job?
Design and maintain scalable data pipelines on AWS (EMR, S3, Glue, Iceberg).
Transform raw, semi-structured data into analytics-ready datasets using Spark.
Automate schema management, validation, and quality checks.
Optimize performance and costs with smart partitioning, tuning, and monitoring.
Research and evaluate new technologies, proposing solutions that improve scalability and efficiency.
Plan and execute complex data projects with foresight and attention to long-term maintainability.
Collaborate with engineers, analysts, and stakeholders to deliver trusted data for reporting and dashboards.
Contribute to CI/CD practices, testing, and automation.
Requirements:
Strong coding skills in Python (PySpark, pandas, boto3).
Experience with big data frameworks (Spark) and schema evolution.
Knowledge of lakehouse technologies (especially Apache Iceberg).
Familiarity with AWS services: EMR, S3, Glue, Athena.
Experience with orchestration tools like Airflow.
Solid understanding of CI/CD and version control (GitHub Actions).
Ability to research, evaluate, and plan ahead for new solutions and complex projects.
Nice to have:
Experience with MongoDB or other NoSQL databases.
Experience with stream processing (e.g., Kafka, Kinesis, Spark Structured Streaming).
Ability to create visualized dashboards and work with Looker (Enterprise).
Infrastructure-as-code (Terraform).
Strong debugging and troubleshooting skills for distributed systems.
Strong coding skills in Python (PySpark, pandas, boto3).
Experience with big data frameworks (Spark) and schema evolution.
Knowledge of lakehouse technologies (especially Apache Iceberg).
Familiarity with AWS services: EMR, S3, Glue, Athena.
Experience with orchestration tools like Airflow.
Solid understanding of CI/CD and version control (GitHub Actions).
Ability to research, evaluate, and plan ahead for new solutions and complex projects.
Nice to have:
Experience with MongoDB or other NoSQL databases.
Experience with stream processing (e.g., Kafka, Kinesis, Spark Structured Streaming).
Ability to create visualized dashboards and work with Looker (Enterprise).
Infrastructure-as-code (Terraform).
Strong debugging and troubleshooting skills for distributed systems.
This position is open to all candidates.













