Youll work with cutting-edge technologies like vLLM, Triton, SageMaker, ClearML, Karpenter, KEDA, and EKS, ensuring the right balance between performance, scalability, and cost.
What Youll Do
Deploy and manage LLMs and deep learning models using vLLM, Triton Inference Server, and custom API endpoints.
Build and maintain GPU-aware autoscaling clusters using AWS EKS, Karpenter, and KEDA, optimizing for cost-efficiency and performance.
Develop CI/CD pipelines using Jenkins and GitHub Actions to automate ML model delivery and application deployments.
Orchestrate training, fine-tuning, and inference jobs on AWS SageMaker and ClearML, with support for experiment tracking, versioning, and reproducibility.
Support backend teams in deploying app artifacts and runtime environments; implement rollback and release strategies.
Integrate observability tooling (e.g., Prometheus, Grafana, ELK, or OpenTelemetry) for both infrastructure and model performance.
Collaborate with SREs to enforce high availability, disaster recovery, and incident response procedures for mission-critical AI services.
6+ years of experience in DevOps, MLOps, or infrastructure roles with a focus on ML model delivery.
Proven hands-on experience deploying GPU-based models (LLMs, vision, transformers) using vLLM or Triton.
Deep knowledge of AWS EKS and Kubernetes, with practical experience configuring Karpenter and KEDA for auto-scaling GPU workloads.
Experience building pipelines using Jenkins, GitHub Actions, and managing releases for ML and application codebases.
Familiarity with AWS SageMaker, ClearML, or similar platforms for ML orchestration and experimentation.
Strong scripting and automation skills in Python, Bash, and working knowledge of containerization (Docker).
Solid grasp of networking, IAM, and cloud security fundamentals.