Your Career The ideal candidate enjoys working in a fast-paced environment with highly innovative technologies. Your Impact Provision, configure, and support resilient hybrid cloud deployment architectures using the automation framework Collaborate with development teams to ensure applications are production-ready, scalable, and reliable from the outset Manage CI/CD platform, Linux infrastructure, and collaborate with other SREs to deploy and maintain the automation framework, perform capacity planning, and create and review operational runbooks. Set up critical infrastructure and develop tools and frameworks to automate operational tasks, including the deployment of machines, services, and applications Participate in Incident Command on-call rotation supporting critical applications and services. Conducts root cause analysis of critical business and production issues and drives future preventive measures Manage scalability, capacity planning, redundancy, and resiliency Maintain service availability and performance SLAs based on business and product requirements. Contribute to documentation related to design, deployment, validation, and operations Design proactive service monitoring, alerting, and trend analysis of underlying infrastructure, and support the operations team in implementation Establish end-to-end monitoring and alerting on all critical components of the application.
Requirements:
Your Experience 6+ Years of system engineering experience on mission-critical, enterprise-level systems 6+ years of experience using Infrastructure-As-Code to build large-scale environments, mainly on Linux platform (Ubuntu, SUSE, CentOS). 3+ years of experience working with cloud environments, primarily Google Cloud Platform Demonstrated Linux/Systems experience in a hybrid (cloud, on-prem) environment Strong experience with CI/CD pipeline, GitHub, Jenkins, Artifactory Must have a strong foundation in Linux operating systems, Troubleshooting, Design, and Implementation Expertise in configuration management with a framework such as Terraform, Ansible, and Helm. Experience using Infrastructure-As-Code to build large-scale environments Experience with Linux vulnerability management process and patching Must have programming knowledge in Python/Bash/Perl/Go languages to automate infrastructure workflow Understanding of software development methodologies and practices, including agile development, continuous integration, and continuous delivery Understanding of Network Firewalls, load balancers, and complex network designs Experience in monitoring technologies like Datadog, Nagios, Graphite, Cacti, and Grafana. Understanding Kubernetes, container lifecycle, and troubleshooting Hands-on knowledge of high-availability approaches such as load balancing, failover, clustering, and disaster recovery Excellent problem-solving, critical thinking, communication, and teamwork skills Passion, drive, energy, a sense of humor, and a great attitude.
Your Experience 6+ Years of system engineering experience on mission-critical, enterprise-level systems 6+ years of experience using Infrastructure-As-Code to build large-scale environments, mainly on Linux platform (Ubuntu, SUSE, CentOS). 3+ years of experience working with cloud environments, primarily Google Cloud Platform Demonstrated Linux/Systems experience in a hybrid (cloud, on-prem) environment Strong experience with CI/CD pipeline, GitHub, Jenkins, Artifactory Must have a strong foundation in Linux operating systems, Troubleshooting, Design, and Implementation Expertise in configuration management with a framework such as Terraform, Ansible, and Helm. Experience using Infrastructure-As-Code to build large-scale environments Experience with Linux vulnerability management process and patching Must have programming knowledge in Python/Bash/Perl/Go languages to automate infrastructure workflow Understanding of software development methodologies and practices, including agile development, continuous integration, and continuous delivery Understanding of Network Firewalls, load balancers, and complex network designs Experience in monitoring technologies like Datadog, Nagios, Graphite, Cacti, and Grafana. Understanding Kubernetes, container lifecycle, and troubleshooting Hands-on knowledge of high-availability approaches such as load balancing, failover, clustering, and disaster recovery Excellent problem-solving, critical thinking, communication, and teamwork skills Passion, drive, energy, a sense of humor, and a great attitude.
This position is open to all candidates.















