Our team is looking for a Deep Learning Engineer.
We are one of the few companies to have trained multi-billion parameter Large Language Models (LLMs), a feat that involves the most advanced engineering (large scale distributed training on thousands of cores). Serving these LLMs efficiently requires cutting-edge technology as well. As a deep learning engineer on the team, you will be responsible for maintaining and improving our training infrastructure, developing/scaling/testing new ideas, and adapting our code to run on and best utilize the newest and most advanced hardware accelerators.
Role and Responsibilities:
Develop Large Language Models as part of our applied research projects and in support of our Platform, including designing, implementing and training massive-scale deep language models
Implement, optimize, scale and test new cutting edge ideas and architectures
Perform large-scale evaluations and comparisons of trained models across a range of benchmarks, as well as adding support for new benchmarks.
We are one of the few companies to have trained multi-billion parameter Large Language Models (LLMs), a feat that involves the most advanced engineering (large scale distributed training on thousands of cores). Serving these LLMs efficiently requires cutting-edge technology as well. As a deep learning engineer on the team, you will be responsible for maintaining and improving our training infrastructure, developing/scaling/testing new ideas, and adapting our code to run on and best utilize the newest and most advanced hardware accelerators.
Role and Responsibilities:
Develop Large Language Models as part of our applied research projects and in support of our Platform, including designing, implementing and training massive-scale deep language models
Implement, optimize, scale and test new cutting edge ideas and architectures
Perform large-scale evaluations and comparisons of trained models across a range of benchmarks, as well as adding support for new benchmarks.
Requirements:
B.Sc. in computer science, software engineering or equivalent
Self learner, and proven record of ability to remove technical road-blocks
5+ years experience developing software for production systems and/or internal infrastructure/tools
Prior experience working with cloud computing platforms (e.g. AWS, GCP, Docker, Kubernetes)
Skilled at writing production-grade Python code
Hands-on experience in deep learning and machine learning (TensorFlow/PyTorch..)
Any one of the following:
Optimization of deep learning model training (E.g. parallelization, megatron, deepspeed, FSDP)
or
Custom kernel experience (C++/CUDA and/or Triton)
or
Distributed Systems, in particular distributed deep learning training/serving.
B.Sc. in computer science, software engineering or equivalent
Self learner, and proven record of ability to remove technical road-blocks
5+ years experience developing software for production systems and/or internal infrastructure/tools
Prior experience working with cloud computing platforms (e.g. AWS, GCP, Docker, Kubernetes)
Skilled at writing production-grade Python code
Hands-on experience in deep learning and machine learning (TensorFlow/PyTorch..)
Any one of the following:
Optimization of deep learning model training (E.g. parallelization, megatron, deepspeed, FSDP)
or
Custom kernel experience (C++/CUDA and/or Triton)
or
Distributed Systems, in particular distributed deep learning training/serving.
This position is open to all candidates.