As part of your role, you would improve and establish new monitoring, alerting and observability of services using a wide range of tools.
Additionally, you would handle critical alerts and incidents and work directly with DevOps Teams to improve and optimize availability
Responsibilities:
Own the production infrastructure over Public and Private Cloud, On-Premise and internal systems.
Research production workflows, identify optimization opportunities, issues and improve monitoring.
Help Identify root causes for incidents and prevent them from happening again including publishing RCAs.
Improve and establish alerting for our infrastructure, services and business logic.
Communicate and escalate issues to senior management in R&D, DevOps, Support.
Additionally, you would handle critical alerts and incidents and work directly with DevOps Teams to improve and optimize availability
Responsibilities:
Own the production infrastructure over Public and Private Cloud, On-Premise and internal systems.
Research production workflows, identify optimization opportunities, issues and improve monitoring.
Help Identify root causes for incidents and prevent them from happening again including publishing RCAs.
Improve and establish alerting for our infrastructure, services and business logic.
Communicate and escalate issues to senior management in R&D, DevOps, Support.
Requirements:
At least 3+ years of experience as DevOps, SRE , Infra Backend.
At least 2 years of experience with Alerting & Monitoring systems such as DataDog, Site24x7
Experience with running distributed systems deployed multiple geographies across the globe
Solid knowledge in networking and internet technologies – e.g. HTTP servers, DNS, firewalls, proxies, etc
Experience working with Linux and Windows systems
Experience with Docker, Kubernetes and Helm
Cloud systems such as AWS / Azure
Familiarity with Database, WebHosting, Automations
An innovative approach, with the ability to quickly learn technologies
High Analytical & Troubleshooting skills – ability to solve complex problems
Fast learner and able to take a project from POC to production, while handling decision making and communications
At least 3+ years of experience as DevOps, SRE , Infra Backend.
At least 2 years of experience with Alerting & Monitoring systems such as DataDog, Site24x7
Experience with running distributed systems deployed multiple geographies across the globe
Solid knowledge in networking and internet technologies – e.g. HTTP servers, DNS, firewalls, proxies, etc
Experience working with Linux and Windows systems
Experience with Docker, Kubernetes and Helm
Cloud systems such as AWS / Azure
Familiarity with Database, WebHosting, Automations
An innovative approach, with the ability to quickly learn technologies
High Analytical & Troubleshooting skills – ability to solve complex problems
Fast learner and able to take a project from POC to production, while handling decision making and communications
This position is open to all candidates.