SRE Engineer

הגשת מועמדות יצירת בקשה לפגישה עם המעסיק

פורסם לפני יותר מחודש

פורסמה ברשת

we are looking for an experienced Site Reliability Engineering (SRE) Engineer with a passion for cloud-native system observability and a track record in implementing state-of-the-art monitoring solutions that offer comprehensive insights.
As an SRE Engineer, you will be instrumental in driving the adoption of progressive delivery practices, ensuring the deployment of robust and reliable systems with minimal operational disruptions.
Responsibilities
Master the art of cloud-native system observability by identifying and deploying monitoring tools and solutions that provide deep operational insights, ensuring the reliability and performance of cloud infrastructure.
Champion progressive delivery methods, employing strategies and technologies that enable the smooth and reliable deployment of systems, minimizing downtime and operational friction.
Live and breathe system metrics, utilizing data to drive significant improvements across the platform. Your knack for interpreting complex data into actionable plans will be key to enhancing system reliability and performance.
Commit to maintaining high system uptime, rigorously meeting and exceeding Service Level Agreements (SLAs), Service Level Indicators (SLIs), and Service Level Objectives (SLOs), ensuring platform remains highly available and performant.
Adopt a proactive approach to system optimization, continuously seeking opportunities to improve infrastructure before issues arise, enhancing system efficiency and reducing the likelihood of unexpected downtime.
Work closely with Engineering, DevOps, and Product teams to integrate observability and reliability best practices into the architectural and infrastructure design, ensuring security and performance from the ground up.
Lead and contribute to the design and support of best-in-class integrations with third-party partners, vendors, and clients, alongside Architects, Developers, System, and Security Owners.
Train and educate the Technology team on SRE principles, tools, and best practices.
Respond to and manage incidents with a focus on rapid recovery and minimizing impact, utilizing insights gained to prevent future occurrences.

Requirements:
Implement Advanced Observability Frameworks: Design and deploy comprehensive observability systems to monitor health, performance, and reliability of cloud-native applications. Utilize advanced tools for logging, metrics collection, and event monitoring to ensure deep visibility into system operations.
Deep knowledge of cloud platforms (AWS, GCP, Azure) and experience with cloud-native technologies.
Deep understanding of Kubernetes infrastructure.
Proficiency in monitoring tools (datadog, Prometheus, Grafana) and experience in setting up comprehensive monitoring and alerting systems.
Excellent problem-solving skills and the ability to work under pressure to resolve incidents and ensure system reliability.
Progressive Delivery Expertise: Leverage progressive delivery techniques such as canary releases (argo rollouts) – BIG advantage.
Tracing and Debugging: manage distributed tracing systems (Datadog APM / Jaeger / OpenTelemetry) to diagnose and troubleshoot complex issues across microservices architectures. Employ effective logging and tracing strategies to pinpoint root causes of incidents and performance bottlenecks – BIG advantage.
Programming and Scripting Skills: Proficiency in programming languages such as Python and Go, and Bash – MUST.
Good presentation skills: Ability to articulate technically advanced issues to all audiences; Ability to mentor and train internal staff.
Strong organizational skills and excellent attention to details.
Ability to effectively prioritize and execute tasks.
Self-driven.
Excellent English.

This position is open to all candidates.

מידת ההתאמה שלי לתפקיד

התאמה למשרה

התאמתך לתפקיד מחושבת על פי כישורך (כפי שסיפרת לנו עליהם) מול דרישות המעסיק - אין בכך כדי להעיד על קבלתך לעבודה (זה יחליט המעסיק)

כישורים חסרים

איתור/דיבאג של מערכות מבוזרותgot it don't got itהבנת תשתית קוברנטיסgot it don't got itהוצאה לפועל של שחרורים מדורגיםgot it don't got itהטמעת כלים לניטור מערכתgot it don't got itידע עמוק של פלטפורמות ענןgot it don't got itיוזמות אופטימיזציה של מערכותgot it don't got itיכולות הצגה טכנית טובהgot it don't got itמוטיבציה עצמיתgot it don't got itמיומנויות ארגוניות חזקותgot it don't got itניטור תחזוקתי למערכת ענןgot it don't got itעדיפות משימות וביצועgot it don't got itעמידה ב-SLA, SLI, SLOgot it don't got itפתרון בעיות תחת לחץgot it don't got itשיטות מסירה פרוגרסיביותgot it don't got itשיפורי מערכת מבוססי נתוניםgot it don't got itשליטה באנגליתgot it don't got itשליטה בכלי ניטור תקלותgot it don't got itתכנות ב-Python, Go, Bashgot it don't got it

למציאת הכשרות רלוונטיות עדכון כישורים בפרופיל האישי

משרות חדשות במערכת שיכולות לעניין אותך

DevOps Engineer

CodeValue

דוברי שפות

חולון / בת יםיבנהלוד / רמלהמודיעיןפתח תקווהראש העיןראשון לציוןרחובות / נס ציונה/ גדרהתל אביב

פורסם לפני יותר מחודש

[email protected] CodeValue, founded in 2010, is a dynamic software development services company that delivers supreme architectural and technical expertise.Our mission ...

משרות תוצאות חיפוש מתאימות לבקשתך:

SRE Engineer