What will you do?
Analysis of large numbers of logs from printers sent several times a day in the form of zip files into Azure Blob Storage.
Currently, the logs require a lot of cleaning and transformation.
Logs from various files are aggregated and combined to calculate and present KPIs that are important for the business.
analyze and organize raw data
evaluate business needs and objectives
developing the existing data lake
notebook implementation for ELT processes and business logic
code optimization
refactoring
writing unit tests
preparing documentation
Analysis of large numbers of logs from printers sent several times a day in the form of zip files into Azure Blob Storage.
Currently, the logs require a lot of cleaning and transformation.
Logs from various files are aggregated and combined to calculate and present KPIs that are important for the business.
analyze and organize raw data
evaluate business needs and objectives
developing the existing data lake
notebook implementation for ELT processes and business logic
code optimization
refactoring
writing unit tests
preparing documentation
Requirements:
2 years of experience developing big data platform.
1 year experience developing data bricks – Mandatory
Azure knowledge (must have): Azure Data Factory, ADLS gen2 , Event Hub/ IoT Hub/ Stream Analytics, and Networking
in Azure (Vnet/ subnets/ gateway/ private endpoints)
Databricks knowledge – Pyspark or Spark SQL
Languages:
– Advanced SQL knowledge (query optimization, partitioning, clustering etc) -> TSQL, ANSI SQL
– Regular Python knowledge
– Git knowledge
– Basic knowledge about NoSQL databases
Roles and security (Microsoft Entra ID, security groups etc) – Advantage
Snowflake basic/ regular knowledge – advantage
Data Warehousing knowledge designing, data modelling, methodologies (like Kimball) and so on
Data Lake and Lakehouse concept medallion architecture etc
Big Data concepts knowledge Hive, Spark, partitioning, scaling up and out, streaming processing
Basic DevOps knowledge:
– Azure data DevOps (Boards, tasks, creating PR…)
– CICD processes basic knowledge
2 years of experience developing big data platform.
1 year experience developing data bricks – Mandatory
Azure knowledge (must have): Azure Data Factory, ADLS gen2 , Event Hub/ IoT Hub/ Stream Analytics, and Networking
in Azure (Vnet/ subnets/ gateway/ private endpoints)
Databricks knowledge – Pyspark or Spark SQL
Languages:
– Advanced SQL knowledge (query optimization, partitioning, clustering etc) -> TSQL, ANSI SQL
– Regular Python knowledge
– Git knowledge
– Basic knowledge about NoSQL databases
Roles and security (Microsoft Entra ID, security groups etc) – Advantage
Snowflake basic/ regular knowledge – advantage
Data Warehousing knowledge designing, data modelling, methodologies (like Kimball) and so on
Data Lake and Lakehouse concept medallion architecture etc
Big Data concepts knowledge Hive, Spark, partitioning, scaling up and out, streaming processing
Basic DevOps knowledge:
– Azure data DevOps (Boards, tasks, creating PR…)
– CICD processes basic knowledge
This position is open to all candidates.