Required Deep Learning Engineer / LLM Engineer
Full-Time
Tel Aviv
Who are you?
You are a passionate and driven individual who strives to be their best everyday. You have extensive experience building infrastructures that enable training, fine-tuning, and serving billion-parameter scale deep learning models, especially in the NLP domain using Pytorch and the huggingface ecosystem.
What youll be doing:
Build, train, and fine-tune large language models using Pytorch on advanced hardware setups like GPUs and TPUs, employing CUDA in multi-node, multi-GPU environments.
Develop robust serving APIs to deliver sub-second latency inference for large language models, utilizing various optimization techniques.
Continuously improve model performance by fine-tuning LLMs, embedding models, and rankers to meet specific application needs.
Full-Time
Tel Aviv
Who are you?
You are a passionate and driven individual who strives to be their best everyday. You have extensive experience building infrastructures that enable training, fine-tuning, and serving billion-parameter scale deep learning models, especially in the NLP domain using Pytorch and the huggingface ecosystem.
What youll be doing:
Build, train, and fine-tune large language models using Pytorch on advanced hardware setups like GPUs and TPUs, employing CUDA in multi-node, multi-GPU environments.
Develop robust serving APIs to deliver sub-second latency inference for large language models, utilizing various optimization techniques.
Continuously improve model performance by fine-tuning LLMs, embedding models, and rankers to meet specific application needs.
Requirements:
2+ years of experience working with large-scale Pytorch-based deep learning applications on GPUs and TPUs using CUDA in multi-node multi-GPU scenarios
2+ years of experience building, training and fine-tuning pipelines for large language models (LLMs) using distributed training approaches for both model and data
2+ years of experience building serving APIs for sub-second latency inference of large language models using various optimization techniques
Extensive experience with Pytroch, DeepSpeed, Megatron-LM, and the Huggingface ecosystem
Proven track record in fine-tuning embedding models, and re-rankers.
1+ years of experience with model training optimization and distributed training, using libraries such as HF Accelerate, BitsandBytes, and Flash-Attention to enhance training efficiency and scalability.
Experience in managing machine learning experiments and monitoring model performance, using tools like Weights and Biases and MLFlow.
Familiarity with embedding models inference libraries (e.g.,Infinity, Text Embedding Interface) and LLM inference libraries (e.g., VLLM, Text Generation Interface).
Experience with Keras, JAX/FLAX an advantage
Experience with advanced finetuning methods such as RLHF, DPO, KTO, ORPO, etc.. an advantage
Experience with parameter efficient finetuning methods such as LoRA, DoRA, ReFT, IA3, etc.. an advantage.
2+ years of experience working with large-scale Pytorch-based deep learning applications on GPUs and TPUs using CUDA in multi-node multi-GPU scenarios
2+ years of experience building, training and fine-tuning pipelines for large language models (LLMs) using distributed training approaches for both model and data
2+ years of experience building serving APIs for sub-second latency inference of large language models using various optimization techniques
Extensive experience with Pytroch, DeepSpeed, Megatron-LM, and the Huggingface ecosystem
Proven track record in fine-tuning embedding models, and re-rankers.
1+ years of experience with model training optimization and distributed training, using libraries such as HF Accelerate, BitsandBytes, and Flash-Attention to enhance training efficiency and scalability.
Experience in managing machine learning experiments and monitoring model performance, using tools like Weights and Biases and MLFlow.
Familiarity with embedding models inference libraries (e.g.,Infinity, Text Embedding Interface) and LLM inference libraries (e.g., VLLM, Text Generation Interface).
Experience with Keras, JAX/FLAX an advantage
Experience with advanced finetuning methods such as RLHF, DPO, KTO, ORPO, etc.. an advantage
Experience with parameter efficient finetuning methods such as LoRA, DoRA, ReFT, IA3, etc.. an advantage.
This position is open to all candidates.