We are on an expedition to find you, someone who is passionate about creating intuitive, out-of-this-world production-grade AI systems and ML pipelines to join our AI group. You'll be responsible for designing, building, deploying, and maintaining production-grade AI systems and ML pipelines. Youll work closely with data scientists to translate research into scalable solutions and manage model deployment in both cloud and on-prem GPU environments.
:Responsibilities
Design, build, and deploy production-grade ML models, AI agents, and end-to-end pipelines across cloud and on-prem GPU environments.
Maintain and optimize ML systems for performance, scalability and reliability, including model validation, inference speed, and resource efficiency.
Develop monitoring and observability tools such as alerts and performance metrics to ensure system stability in production.
Create and integrate APIs for ML services within microservice-based architectures.
Drive adoption of best practices for CI/CD, observability, and reproducibility in ML systems.
:Responsibilities
Design, build, and deploy production-grade ML models, AI agents, and end-to-end pipelines across cloud and on-prem GPU environments.
Maintain and optimize ML systems for performance, scalability and reliability, including model validation, inference speed, and resource efficiency.
Develop monitoring and observability tools such as alerts and performance metrics to ensure system stability in production.
Create and integrate APIs for ML services within microservice-based architectures.
Drive adoption of best practices for CI/CD, observability, and reproducibility in ML systems.
Requirements:
3+ years of experience delivering production-grade ML/AI systems
Strong Python skills and solid understanding of the ML lifecycle
Experience with GPU infrastructure, containerization (Docker) and cloud platforms
Familiarity with microservice architectures and API development
Hands-on experience with LLM pipelines and agent orchestration frameworks (LangGraph, LlamaIndex, etc.)
Knowledge of experiment tracking tools (Weights & Biases, MLflow, or similar)
Background in scalable ML infrastructure, distributed computing, and workflow orchestration frameworks (Ray, Kubeflow, Airflow)
Experience with multi-node training (advantage)
Collaborative mindset with startup-level ownership and pragmatism
3+ years of experience delivering production-grade ML/AI systems
Strong Python skills and solid understanding of the ML lifecycle
Experience with GPU infrastructure, containerization (Docker) and cloud platforms
Familiarity with microservice architectures and API development
Hands-on experience with LLM pipelines and agent orchestration frameworks (LangGraph, LlamaIndex, etc.)
Knowledge of experiment tracking tools (Weights & Biases, MLflow, or similar)
Background in scalable ML infrastructure, distributed computing, and workflow orchestration frameworks (Ray, Kubeflow, Airflow)
Experience with multi-node training (advantage)
Collaborative mindset with startup-level ownership and pragmatism
This position is open to all candidates.

















