Responsibilities:
Apply software-engineering methods and best practices (Medallion Architecture) in Data Lake management.
Develop data pipelines to provide injest and transformation of data.
Manage and enhance orchestrated pipelines across company
Data Modeling: Developing data models to ensure efficient storage, retrieval, and processing of data, often involving schema design for NoSQL databases, data lakes, and data warehouses.
Scalability: Ensuring that the big data architecture can scale horizontally to handle large volumes of data and high traffic loads.
Data Security: Implementing security measures to protect sensitive data, including encryption, access controls, and data masking.
Data Governance: Establishing data governance policies to ensure data quality, compliance with regulations, and data lineage.
5 years of experience in Python (+Java / Scala / Kotlin / Go / NodeJS )
Experience in NoSQL and data storage (e.g. Elasticm, Redis, MongoDB, CouchBase, BigQuery, Snowflake, Databricks)
Experience in data formats (Parquet, Avro, ORC)
Experience in Big Data Technologies (Spark, MapReduce )
Experience in streaming (Spark Streaming, Flink, Kafka Streams, Beam, etc.)
Experience in messaging (Kafka, RabbitMQ, etc.)
Experience with Machine Learning (Spark ML, TensorFlow, etc.) – Plus
Experience in cloud platforms (AWS, GCP)