What you'll be doing:
Model Development & Optimization
Fine-tuning & Engineering: Develop and fine-tune LLMs/classic models for specific medical tasks (condition detection, dialogue systems) using internal datasets.
Research to Production: Partner with Data Scientists to develop experimental code into scalable, production-ready modules.
Platform Engineering & Infrastructure
Pipeline Orchestration: Build and maintain complex ML workflows using Kubeflow Pipelines (KFP) or similar orchestration tools.
Internal Tooling: Develop and manage the internal Python ecosystem-libraries, SDKs, and utilities-that the Data Science team uses for daily development.
CI/CD & Automation: Write and maintain CI/CD scripts to automate the testing, versioning, and deployment of machine learning artifacts.
Production Standards & Integration
System Integration: Work with backend developers to integrate trained models/agents into the core application architecture.
Code Quality: Enforce high engineering standards through code reviews, ensuring that research code meets production reliability and maintainability requirements.
Requirements Engineering: Translate evolving data science requirements into concrete infrastructure and platform features.
Experience: 10+ years in software engineering with 5+ years in backend/platform roles.
Languages: Expert-level Python; proficiency in another language, such as C++, Rust, Java, or Go, is an advantage.
Cloud & Infra: 4+ years with GCP (preferred) or AWS, including Docker, Kubernetes, and pipelines (KFP/Vertex).
ML Core: Production experience with PyTorch, Transformers, and low-level libraries (CUDA).
LLM Stack: Experience with inference optimization (e.g: vLLM/NGC) and fine-tuning (Axolotl/Huggingface).
Key Traits: Strong focus on code optimization, system reliability, and collaborative problem-solving.

















