We’re seeking a seasoned Data Engineer to build the data infrastructure that fuels our groundbreaking intelligent agent.
You’ll play a crucial role in developing large-scale data-intensive systems that power Apollo’s capabilities.
What You’ll Do:
– Design and implement massive parallel processing solutions for both real-time and batch scenarios
– Develop real-time stream processing solutions using technologies like Apache Kafka or Amazon Kinesis
– Build infrastructures that bring machine learning capabilities to production
– Orchestrate containerized applications in cloud environments (AWS and GCP)
– Write production-grade Python code and work with various database systems
– Administer and design cloud-based data warehousing solutions
– Work with unstructured data, complex data sets, and perform data modeling
– Collaborate with cross-functional teams to integrate data solutions into our AI systems
You’ll play a crucial role in developing large-scale data-intensive systems that power Apollo’s capabilities.
What You’ll Do:
– Design and implement massive parallel processing solutions for both real-time and batch scenarios
– Develop real-time stream processing solutions using technologies like Apache Kafka or Amazon Kinesis
– Build infrastructures that bring machine learning capabilities to production
– Orchestrate containerized applications in cloud environments (AWS and GCP)
– Write production-grade Python code and work with various database systems
– Administer and design cloud-based data warehousing solutions
– Work with unstructured data, complex data sets, and perform data modeling
– Collaborate with cross-functional teams to integrate data solutions into our AI systems
Requirements:
– + years of experience building massive parallel processing solutions (e.g., Spark, Presto)
– 2+ years of experience developing real-time stream processing solutions (e.g., Apache Kafka, Amazon Kinesis)
– 2+ years of experience developing ML infrastructures for production (e.g., Kubeflow, Sagemaker, Vertex)
– Experience orchestrating containerized applications in AWS and GCP using EKS and GKE
– 3+ years of experience writing production-grade Python code
– Experience working with both relational and non-relational databases
– 2+ years of experience administering and designing cloud-based data warehousing solutions (e.g., Snowflake, Amazon Redshift)
– 2+ years of experience working with unstructured data, complex data sets, and data modeling
– + years of experience building massive parallel processing solutions (e.g., Spark, Presto)
– 2+ years of experience developing real-time stream processing solutions (e.g., Apache Kafka, Amazon Kinesis)
– 2+ years of experience developing ML infrastructures for production (e.g., Kubeflow, Sagemaker, Vertex)
– Experience orchestrating containerized applications in AWS and GCP using EKS and GKE
– 3+ years of experience writing production-grade Python code
– Experience working with both relational and non-relational databases
– 2+ years of experience administering and designing cloud-based data warehousing solutions (e.g., Snowflake, Amazon Redshift)
– 2+ years of experience working with unstructured data, complex data sets, and data modeling
This position is open to all candidates.