The Senior Machine Learning Operations (MLOps) Engineer, is responsible for providing operational support for the machine learning team’s model deployments, including development, deployment, and benchmarking.
This role also supports machine learning model development and ensures the seamless integration and optimization of models.
Role and Responsibilities:
Manage and support the deployment of machine learning models in production environments, ensuring high availability and reliability.
Develop and maintain CI/CD pipelines for machine learning models to streamline the deployment process.
Implement monitoring and alerting systems to track model performance and detect anomalies in real-time.
Oversee the infrastructure used for model training and deployment, ensuring scalability and efficient resource utilization.
Conduct benchmarking of machine learning models to evaluate performance and identify areas for improvement.
Work closely with data scientists and engineers to integrate models into applications and optimize performance.
Manage model versioning and maintain a clear history of changes and updates.
Create and maintain documentation for deployed models, processes, and best practices.
Ensure the security of model deployments by implementing appropriate access controls and monitoring for vulnerabilities.
Implement tools and practices for tracking experiments and model performance metrics.
This role also supports machine learning model development and ensures the seamless integration and optimization of models.
Role and Responsibilities:
Manage and support the deployment of machine learning models in production environments, ensuring high availability and reliability.
Develop and maintain CI/CD pipelines for machine learning models to streamline the deployment process.
Implement monitoring and alerting systems to track model performance and detect anomalies in real-time.
Oversee the infrastructure used for model training and deployment, ensuring scalability and efficient resource utilization.
Conduct benchmarking of machine learning models to evaluate performance and identify areas for improvement.
Work closely with data scientists and engineers to integrate models into applications and optimize performance.
Manage model versioning and maintain a clear history of changes and updates.
Create and maintain documentation for deployed models, processes, and best practices.
Ensure the security of model deployments by implementing appropriate access controls and monitoring for vulnerabilities.
Implement tools and practices for tracking experiments and model performance metrics.
Requirements:
Masters or higher-level degree in Machine Learning preferred.
Strong understanding of CICD processes and frameworks such as Terraform.
Strong understanding of AWS services.
5+ years experience with Python and/or C++.
Knowledge of machine learning frameworks such as PyTorch.
Advantage:
Experience with NVIDIA CUDA and hardware deployments.
Experience with NVIDIA Deepstream and Triton frameworks.
Experience with AWS.
Experience with Nvidia CUDA coding.
Masters or higher-level degree in Machine Learning preferred.
Strong understanding of CICD processes and frameworks such as Terraform.
Strong understanding of AWS services.
5+ years experience with Python and/or C++.
Knowledge of machine learning frameworks such as PyTorch.
Advantage:
Experience with NVIDIA CUDA and hardware deployments.
Experience with NVIDIA Deepstream and Triton frameworks.
Experience with AWS.
Experience with Nvidia CUDA coding.
This position is open to all candidates.