Manager SRE Team (Cortex)

הגשת מועמדות יצירת בקשה לפגישה עם המעסיק

פורסם לפני יותר מחודש

פורסמה ברשת

We are looking for a motivated SRE Manager to join our global Devops group in our Tel Aviv R&D center. The group is responsible for the reliability and availability of the production environment hosting Cortex products and the enablement of the entire Cortex RnD group using CI tools, infrastructure and automations. In this role you will be a part of a Devops group that is responsible for planning, executing and reporting of the various infrastructure and code projects, as well as managing and executing high pressure production maintenance work and issues.
The candidate will be a hands-on manager with an established background in operations and cloud infrastructure, developing and hiring extraordinary talent, have strong technical ability, great communication skills, and a motivation to achieve results in a dynamic fast paced environment
In this role, you will have full accountability for leading and managing a skilled team of Site Reliability Engineers, responsible for maintaining and enhancing our infrastructure, ensuring the resilience of our systems, and driving operational excellence.
Your Impact:
Team Leadership – Lead, mentor, and develop a team of SREs, fostering a culture of collaboration, innovation, and accountability
Reliability and Availability – Take ownership of the reliability and availability of our production environments, ensuring uninterrupted service to our users
Operational Efficiency – Drive initiatives to optimize operational processes, reduce downtime, and enhance system performance, such as post mortems, RCAs and remediation processes
Own monitoring processes, continuously improve alerts, metrics and work with the development teams to improve their applications SLOs
Manage and maintain optimal on-call rotations and shifts – Define escalation paths and take ownership of major incidents
Create management visibility showcasing our SLOs and SLAs
Cloud Expertise – Utilize your expertise in cloud platforms, with a strong emphasis on GCP, to optimize our infrastructure and leverage cloud-native technologies
Scripting and Automation – Demonstrate high proficiency in scripting languages, with a preference for Python, to automate routine tasks and processes
Technology Evaluation – Stay up-to-date with cutting-edge technologies, evaluating their potential impact on our operations, and implementing them when appropriate.

Requirements:
Leadership – A minimum of 3+ years of experience in leading SRE or Operations teams supporting large-scale production environments
5+ years as an SRE, Devops or Operation roles
Cloud Proficiency – High proficiency in the GCP ecosystem
Monitoring – Understanding the SRE concepts of alerts improvements, SLIs, SLOs, avoiding alerts fatigue
Scripting Skills – Strong scripting skills, particularly in Python
Containerization – Experience with virtualized and containerized environments, including Kubernetes and Docker
Infrastructure-as-Code – Familiarity with IaC tools such as Terraform
Communication – Excellent communication and interpersonal skills, with the ability to collaborate effectively across teams
Adaptability – A knack for quickly grasping new technologies and the ability to manage multiple responsibilities simultaneously
Service Reliability – Experience navigating the complexities of business and service reliability.

This position is open to all candidates.

מידת ההתאמה שלי לתפקיד

התאמה למשרה

התאמתך לתפקיד מחושבת על פי כישורך (כפי שסיפרת לנו עליהם) מול דרישות המעסיק - אין בכך כדי להעיד על קבלתך לעבודה (זה יחליט המעסיק)

כישורים חסרים

אמינות שירות ועסקיתgot it don't got itהיכרות עם IaC ו-Terraformgot it don't got itיכולות סקריפטינג בפייתוןgot it don't got itיכולות תקשורת מעולותgot it don't got itיכולת התאמה וריבוי משימותgot it don't got itלהוביל מצוינות תפעוליתgot it don't got itמומחה באקוסיסטם GCPgot it don't got itמיומני תקשורת מעוליםgot it don't got itמערכות מוניטורינג והתראותgot it don't got itניהול SLI/SLO/SLAgot it don't got itניהול אירועים תחת לחץgot it don't got itניהול אמינות שירותgot it don't got itניהול בתחום SRE/תפעולgot it don't got itניהול ופיתוח צוותgot it don't got itניהול מעקב ו-SLOgot it don't got itניידות וריבוי משימותgot it don't got itניסיון ב-Kubernetes ו-Dockergot it don't got itניסיון באמינות שירותgot it don't got itניסיון עם Kubernetes/Dockergot it don't got itניסיון עם סיבובי זימוןgot it don't got itניתן להתאמה לטכנולוגיות חדשותgot it don't got itשיפור תהליכי תפעולgot it don't got itשליטה ב-Terraform IaCgot it don't got itשליטה בכתיבת סקריפטים ב-Pythongot it don't got itתשתיות כקוד (IaC)got it don't got it

למציאת הכשרות רלוונטיות עדכון כישורים בפרופיל האישי

משרות חדשות במערכת שיכולות לעניין אותך

Data Team Lead

פורסם לפני יותר מחודש

Our dedicated team of cloud experts is committed to providing solution-driven technology to streamline business efficiency. We leverage our experience ...

משרות תוצאות חיפוש מתאימות לבקשתך:

Manager SRE Team (Cortex)