The ideal candidate has experience in building data pipelines and data transformations and enjoys optimizing data flows and building them from the ground up. They must be self-directed and comfortable supporting multiple production implementations for various use cases.
Responsibilities
Implement and maintain data pipeline flows in production within the system
Design and implement solution-based data flows for specific use cases, enabling the applicability of implementations within the product
Building a Machine Learning data pipeline
Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader
Work with product, R&D, data, and analytics experts to strive for greater functionality in our systems
Train customer data scientists and engineers to maintain and amend data pipelines within the product
Travel to customer locations both domestically and abroad
Build and manage technical relationships with customers and partners
4+ years of experience as a data engineer – must
Hands-on experience working with Apache Spark, with Pyspark or Scala – must
Hands-on experience with SQL – must
Hands-on experience with version-control tools such as GIT – must
Hands-on experience with data transformation, validations, cleansing, and ML feature engineering – must
BSc degree or higher in Computer Science, Statistics, Informatics, Information Systems, Engineering, or another quantitative field – must
Strong analytic skills related to working with structured and semi-structured datasets – must
Business-oriented and able to work with external customers and cross-functional teams – must
Fluent in English – must






