We are looking for someone to play a key role in helping us develop our Cloud Infrastructure, which runs on more than 30 K8s clusters in GCP and AWS utilizing all the latest and greatest tech for producing state of the art GenAI..
For this, we are utilizing the most advanced technologies that are available today to support our R&D efforts and to develop highly complex products that can be offered to our end users in a highly efficient manner.
Role and Responsibilities
Develop and implement solutions tailored for R&D teams with a focus on high quality and scale
Manage large highly scaled cloud environments built using latest infrastructure as code technologies
In collaboration with R&D teams, troubleshoot and isolate issues at all levels of the stack, from code to infrastructure, in development and production
Identify and resolve problems in mission-critical services, automating remediation procedures to prevent recurrences
Identify, adopt, and integrate technologies that can add new capabilities to our infrastructure and be beneficial to the organization.
Remediate issues impacting the cost, health, and performance of our production systems & infrastructure stability
Measure and monitor the availability, latency, and overall health of the production system in order to maintain production services
Providing operational support for activities such as deployments of services, configuration of interaction between services, etc.
3+ years of experience as a DevOps/SRE
2+ years of experience with public cloud, GCP/AWS preferable
3+ years of experience with Kubernetes
Extensive experience with the following: Shell scripting, Helm and Terraform
Experience with GitOps tools such as ArgoCD and Crossplane
Experience with GitHub and GitHub Actions
Experience with networking, distributed systems, SQL and NoSQL databases
Experience with Unix/Linux operating systems internals and administration
Maintaining production services, and experience analyzing and troubleshooting systems
Commitment to a collaborative environment infused with professionalism, integrity, passion, and accountability
Experience with writing in high level languages such as Python or Go – advantage