As a Senior Big Data Engineer, working within our Foundation Entity Group, you will play a pivotal role in designing, developing, and maintaining the data infrastructure that powers our location analytics platform.
RESPONSIBILITIES:
Data Pipeline Architecture and Development: Design, build, and optimize robust and scalable data pipelines to process, transform, and integrate large volumes of data from various sources into our analytics platform.
Data Quality Assurance: Implement data validation, cleansing, and enrichment techniques to ensure high-quality and consistent data across the platform.
Performance Optimization: Identify performance bottlenecks and optimize data processing and storage mechanisms to enhance overall system performance and reduce latency.
Cloud Infrastructure: Work extensively with cloud-based technologies (GCP), to design and manage scalable data infrastructure.
Collaboration: Collaborate with cross-functional teams including Data Analysts, Data Scientists, Product Managers, and Software Engineers to understand requirements and deliver solutions that meet business needs.
Data Governance: Implement and enforce data governance practices, ensuring compliance with relevant regulations and best practices related to data privacy and security.
Monitoring and Maintenance: Monitor the health and performance of data pipelines, troubleshoot issues, and ensure high availability of data infrastructure.
Mentorship: Provide technical guidance and mentorship to junior data engineers, fostering a culture of learning and growth within the team
RESPONSIBILITIES:
Data Pipeline Architecture and Development: Design, build, and optimize robust and scalable data pipelines to process, transform, and integrate large volumes of data from various sources into our analytics platform.
Data Quality Assurance: Implement data validation, cleansing, and enrichment techniques to ensure high-quality and consistent data across the platform.
Performance Optimization: Identify performance bottlenecks and optimize data processing and storage mechanisms to enhance overall system performance and reduce latency.
Cloud Infrastructure: Work extensively with cloud-based technologies (GCP), to design and manage scalable data infrastructure.
Collaboration: Collaborate with cross-functional teams including Data Analysts, Data Scientists, Product Managers, and Software Engineers to understand requirements and deliver solutions that meet business needs.
Data Governance: Implement and enforce data governance practices, ensuring compliance with relevant regulations and best practices related to data privacy and security.
Monitoring and Maintenance: Monitor the health and performance of data pipelines, troubleshoot issues, and ensure high availability of data infrastructure.
Mentorship: Provide technical guidance and mentorship to junior data engineers, fostering a culture of learning and growth within the team
Requirements:
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
5+ years of professional experience in software development, with at least 3 years as a Big Data Engineer.
Spark expertise (mandatory): Strong proficiency in Apache Spark, including hands-on experience with building data processing applications and pipelines using Spark’s core libraries.
PySpark/Scala (Mandatory): Proficiency in either PySpark (Python API for Spark) or Scala for Spark development.
Data Engineering: Proven track record in designing and implementing ETL pipelines, data integration, and data transformation processes.
Cloud Platforms: Hands-on experience with cloud platforms such as AWS, GCP, or Azure.
SQL and Data Modeling: Solid understanding of SQL, relational databases, and data modeling.
Big Data Technologies: Familiarity with big data technologies beyond Spark, such as Hadoop ecosystem components, data serialization formats (Parquet, Avro), and distributed computing concepts.
Programming Languages: Proficiency in programming languages like Python, Java, or Scala.
ETL Tools and Orchestration: Familiarity with ETL tools and frameworks, such as Apache Airflow.
Problem-Solving: Strong analytical and problem-solving skills.
Collaboration and Communication: Effective communication skills and collaboration within cross-functional teams.
Geospatial Domain (Preferred): Prior experience in the geospatial or location analytics domain is a plus.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
5+ years of professional experience in software development, with at least 3 years as a Big Data Engineer.
Spark expertise (mandatory): Strong proficiency in Apache Spark, including hands-on experience with building data processing applications and pipelines using Spark’s core libraries.
PySpark/Scala (Mandatory): Proficiency in either PySpark (Python API for Spark) or Scala for Spark development.
Data Engineering: Proven track record in designing and implementing ETL pipelines, data integration, and data transformation processes.
Cloud Platforms: Hands-on experience with cloud platforms such as AWS, GCP, or Azure.
SQL and Data Modeling: Solid understanding of SQL, relational databases, and data modeling.
Big Data Technologies: Familiarity with big data technologies beyond Spark, such as Hadoop ecosystem components, data serialization formats (Parquet, Avro), and distributed computing concepts.
Programming Languages: Proficiency in programming languages like Python, Java, or Scala.
ETL Tools and Orchestration: Familiarity with ETL tools and frameworks, such as Apache Airflow.
Problem-Solving: Strong analytical and problem-solving skills.
Collaboration and Communication: Effective communication skills and collaboration within cross-functional teams.
Geospatial Domain (Preferred): Prior experience in the geospatial or location analytics domain is a plus.
This position is open to all candidates.