Were looking for an experienced, highly motivated SRE team lead to lead our SRE team in utilizing methodologies and technologies to implement highly scalable and available production environments.
As a SRE team lead, you will lead a small but growing team. You will have the freedom to explore and implement the newest technologies while leading and mentoring the team. You will be responsible for designing and implementing monitoring and alerting infrastructure and defining the correct measurements for a highly available production environment. You will learn new things every minute of every day and constantly be challenged.
Responsibilities
Lead and mentor the SRE team to design and implement reliable, highly available, and scalable production monitoring infrastructure.
Explore and implement new technologies, from POC through to production.
Ensure high uptime and reliability of the production environment.
Perform root cause analysis for complex failures and offer modern solutions and tools.
Analyze performance and stability issues.
Collaborate closely with DevOps, R&D, product, and support teams to define cross-organizational processes.
Design, develop, and drive troubleshooting & mitigation tools as part of driving a self-healing agenda.
As a SRE team lead, you will lead a small but growing team. You will have the freedom to explore and implement the newest technologies while leading and mentoring the team. You will be responsible for designing and implementing monitoring and alerting infrastructure and defining the correct measurements for a highly available production environment. You will learn new things every minute of every day and constantly be challenged.
Responsibilities
Lead and mentor the SRE team to design and implement reliable, highly available, and scalable production monitoring infrastructure.
Explore and implement new technologies, from POC through to production.
Ensure high uptime and reliability of the production environment.
Perform root cause analysis for complex failures and offer modern solutions and tools.
Analyze performance and stability issues.
Collaborate closely with DevOps, R&D, product, and support teams to define cross-organizational processes.
Design, develop, and drive troubleshooting & mitigation tools as part of driving a self-healing agenda.
Requirements:
At least 4 years of experience as an SRE or in a DevOps role
At least 2 years of experience leading a team or as a tech leaderאבך
Proven monitoring and alerting experience (ELK, Grafana, Prometheus, etc.)
Deep expertise in Kubernetes, container orchestration, and cloud infrastructure (AWS, Azure, or GCP).
Experience with a programming language (Python, Java, Go, Ruby, etc.)
Scripting and automation skills (Bash, Python, etc.)
Networking skills
Experience with IAC tools such as Terraform, etc.
At least 4 years of experience as an SRE or in a DevOps role
At least 2 years of experience leading a team or as a tech leaderאבך
Proven monitoring and alerting experience (ELK, Grafana, Prometheus, etc.)
Deep expertise in Kubernetes, container orchestration, and cloud infrastructure (AWS, Azure, or GCP).
Experience with a programming language (Python, Java, Go, Ruby, etc.)
Scripting and automation skills (Bash, Python, etc.)
Networking skills
Experience with IAC tools such as Terraform, etc.
This position is open to all candidates.