The role: We are looking for a Data Engineer to design, build, and maintain scalable data pipelines and infrastructure that power our data-driven decision-making processes. You will be part of the infrastructure development team and will serve as the professional focal point for areas related to data engineering, to guide and influence best practices across different domains. You will collaborate with data scientists, analysts, and other engineering teams to integrate and optimize data workflows, ensuring the availability of accurate, reliable, and secure data. If you are passionate about building robust data systems and thrive in a collaborative environment, we'd love to hear from you!
Responsibilities:
* Design, deploy, and manage data pipelines and storage solutions in cloud environments, particularly AWS.
* Design, develop, and maintain scalable and efficient ETL pipelines using Python.
* Integrate data from various sources (e.g., databases, APIs, cloud storage) into a unified data warehouse or data lake.
* Design, implement, and manage databases, data warehouses, and data lakes.
* Ensure database optimization and performance tuning for efficient data retrieval and storage.
* Implement data validation and cleaning processes to ensure the accuracy and quality of data.
* Work closely with data scientists, and software engineering teams to ensure seamless data integration into applications and workflows.
* Continuously improve data infrastructure to support scalability and high-performance data processing.
* Automate recurring data-related tasks and workflows using scripting languages (e.g., Python, Bash) or tools (e.g., Apache Airflow).
* Proactively monitor and troubleshoot issues related to data pipelines, databases, and infrastructure.
* 3+ years of experience in data engineering or a related field
* 3+ years of experience working with data science teams to integrate and manage data workflows
* Proficiency in Python for data manipulation and scripting
* Strong knowledge of SQL for querying and data management
* Experience with AWS (e.g., S3, Redshift, EMR, Glue, Athena, Lambda, etc.) for cloud-based data processing and storage
* Experience with Apache Spark for large-scale data processing
* Experience with ETL pipelines (designing, building, and maintaining)
* Experience with Apache Airflow for workflow automation and orchestration Advantages:
* Familiarity with Docker for containerization and deployment
* Experience with AWS CDK for defining cloud infrastructure as code
* Knowledge of BI tools (e.g., Tableau, Looker, Power BI) for data visualization and reporting
* Experience with Apache Iceberg for managing large datasets in cloud environments
* Experience with NoSQL databases (e.g., mongoDB, DynamoDB)
At BeeHero you have the opportunity to be: Impactful: Your work will directly impact agricultural practices aro