As a Data Engineer , you will own the architecture and optimization of large-scale ETL processes that transform raw heavy-duty vehicle telemetry into production-grade intelligence. You will operate at the intersection of Big Data and AI, building scalable pipelines, enforcing data quality standards, and managing cost-efficiency for a system processing billions of time-series records. You will be a technical owner, collaborating directly with Data Scientists to ensure our fleet intelligence models run reliably in production.
What Youll Do
Architect and build robust ETLs and scalable data pipelines on Databricks and AWS.
Optimize high-throughput ingestion workflows for billions of time-series records, ensuring low latency and data integrity.
Engineer data validation frameworks and automated monitoring to proactively detect anomalies before they impact models.
Drive cost-efficiency by tuning Spark jobs and managing compute resources in a high-volume environment.
Transform raw IoT/telemetry signals into structured, enriched Feature Stores ready for Machine Learning production.
Define best practices for data engineering, CI/CD for data, and lakehouse architecture across the organization.
What Youll Do
Architect and build robust ETLs and scalable data pipelines on Databricks and AWS.
Optimize high-throughput ingestion workflows for billions of time-series records, ensuring low latency and data integrity.
Engineer data validation frameworks and automated monitoring to proactively detect anomalies before they impact models.
Drive cost-efficiency by tuning Spark jobs and managing compute resources in a high-volume environment.
Transform raw IoT/telemetry signals into structured, enriched Feature Stores ready for Machine Learning production.
Define best practices for data engineering, CI/CD for data, and lakehouse architecture across the organization.
Requirements:
Production Experience: 3+ years in Data Engineering with strong proficiency in Python, SQL, and PySpark.
Big Data Architecture: Proven track record working with distributed processing frameworks (Spark, Delta Lake) and cloud infrastructure (AWS preferred).
Scale: Experience handling high-volume datasets (TB scale or billions of rows); familiarity with time-series or IoT data is a strong advantage.
Engineering Rigor: Deep understanding of data structures, orchestration (Databricks Workflows), and software engineering best practices (Git, CI/CD).
Problem Solving: Ability to diagnose complex performance bottlenecks in distributed systems and implement cost-effective solutions.
Ownership: A self-starter mindset with the ability to take a vague requirement and deliver a deployed, production-ready pipeline.
Production Experience: 3+ years in Data Engineering with strong proficiency in Python, SQL, and PySpark.
Big Data Architecture: Proven track record working with distributed processing frameworks (Spark, Delta Lake) and cloud infrastructure (AWS preferred).
Scale: Experience handling high-volume datasets (TB scale or billions of rows); familiarity with time-series or IoT data is a strong advantage.
Engineering Rigor: Deep understanding of data structures, orchestration (Databricks Workflows), and software engineering best practices (Git, CI/CD).
Problem Solving: Ability to diagnose complex performance bottlenecks in distributed systems and implement cost-effective solutions.
Ownership: A self-starter mindset with the ability to take a vague requirement and deliver a deployed, production-ready pipeline.
This position is open to all candidates.











