Required Cloud Observability Service Manager
Job Description:
As the Cloud Observability Service Manager, you will play a critical role in ensuring the reliability and performance of our customers cloud-based applications and services. You will guide a team of observability engineers responsible for monitoring, alerting, and optimizing our customers cloud infrastructure and applications.
Key Responsibilities:
Mentor a team of observability engineers, setting clear objectives and providing regular feedback and support.
Develop and implement strategies for cloud observability, ensuring proactive monitoring and alerting to prevent and mitigate issues.
Collaborate with cross-functional teams to define and implement best practices for cloud observability and incident response.
Design and maintain dashboards, metrics, and logs to provide real-time insights into the health of our customers cloud infrastructure.
Work closely with DevOps and SRE teams to optimize system performance and reliability.
Drive automation and scripting efforts to enhance observability tooling and reduce manual tasks.
Stay up-to-date with industry trends and emerging technologies in cloud observability, recommending and implementing improvements as needed.
Participate in on-call rotations and incident response activities as necessary.
Job Description:
As the Cloud Observability Service Manager, you will play a critical role in ensuring the reliability and performance of our customers cloud-based applications and services. You will guide a team of observability engineers responsible for monitoring, alerting, and optimizing our customers cloud infrastructure and applications.
Key Responsibilities:
Mentor a team of observability engineers, setting clear objectives and providing regular feedback and support.
Develop and implement strategies for cloud observability, ensuring proactive monitoring and alerting to prevent and mitigate issues.
Collaborate with cross-functional teams to define and implement best practices for cloud observability and incident response.
Design and maintain dashboards, metrics, and logs to provide real-time insights into the health of our customers cloud infrastructure.
Work closely with DevOps and SRE teams to optimize system performance and reliability.
Drive automation and scripting efforts to enhance observability tooling and reduce manual tasks.
Stay up-to-date with industry trends and emerging technologies in cloud observability, recommending and implementing improvements as needed.
Participate in on-call rotations and incident response activities as necessary.
Requirements:
Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
Proven experience in cloud observability, monitoring, and alerting in a cloud-native environment (e.g., AWS, Azure, GCP).
Strong leadership and management skills, with a track record of building and developing high-performing teams.
Proficiency with observability tools such as Prometheus, Grafana, Datadog, NewRelic, ELK Stack, or similar technologies.
Experience with infrastructure as code (IAC) and automation tools (e.g., Terraform, Ansible, Puppet) is a plus.
Knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture.
Excellent problem-solving and communication skills.
Ability to work collaboratively in a fast-paced and agile environment.
Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer) are a plus.
Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).
Proven experience in cloud observability, monitoring, and alerting in a cloud-native environment (e.g., AWS, Azure, GCP).
Strong leadership and management skills, with a track record of building and developing high-performing teams.
Proficiency with observability tools such as Prometheus, Grafana, Datadog, NewRelic, ELK Stack, or similar technologies.
Experience with infrastructure as code (IAC) and automation tools (e.g., Terraform, Ansible, Puppet) is a plus.
Knowledge of containerization technologies (e.g., Docker, Kubernetes) and microservices architecture.
Excellent problem-solving and communication skills.
Ability to work collaboratively in a fast-paced and agile environment.
Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer) are a plus.
This position is open to all candidates.