ML Engineer - AI Infra Group - COB

ML Engineer – AI Infra Group

מלאה

אזור מרכז - גוש דןחולון / בת יםפתח תקווהראשון לציוןתל אביב

יצירת בקשה לפגישה עם המעסיק

NLP/Machine Learning|תוכנה

אזור מרכז - גוש דןחולון / בת יםפתח תקווהראשון לציוןתל אביב

מלאה

פורסם לפני 2 חודשים

פורסמה ברשת

Required ML Engineer – AI Infra Group
Tel Aviv Full-time
We are on an expedition to find you, someone who is passionate about creating intuitive, out-of-this-world production-grade AI infrastructure. This group builds scalable, high-performance AI systems for internal users and external customers, designed to run seamlessly across cloud and on-premise environments using the latest hardware advancements.
Responsibilities
Design and optimize LLM serving infrastructure using inference engines (vLLM, TensorRT-LLM, Triton Inference Server)
Implement and tune distributed inference strategies including tensor parallelism, pipeline parallelism, and multi-node serving
Develop and apply model compression techniques to optimize cost, latency, and memory footprint while maintaining model quality
Build self-service fine-tuning platforms that enable data scientists to run experiments (LoRA, QLoRA, full fine-tuning) in a standardized, reproducible, and governed manner
Optimize inference performance through batching strategies, KV-cache tuning, and speculative decoding
Develop reusable APIs, abstractions, and platform services for model deployment, scaling, and lifecycle management
Collaborate with AI researchers and product teams to productionize models and meet latency/throughput requirements
Evaluate and benchmark new model architectures, compression methods, and serving frameworks.

Requirements:
5+ years of experience in software engineering or ml engineering with significant focus on ML systems or backend infrastructure
Strong proficiency in Python and deep learning frameworks (PyTorch)
Hands-on experience with LLM inference engines (vLLM, TensorRT-LLM, Triton Inference Server)
Deep understanding of transformer architectures and LLM-specific optimizations (attention mechanisms, KV-cache, quantization techniques like GPTQ, AWQ, GGUF)
Experience with distributed training/fine-tuning frameworks (Ray, DeepSpeed, FSDP)
Ability to build developer-facing tools and platforms with clear APIs and documentation
Understanding of GPU performance profiling and optimization
Familiarity with LLM evaluation methodologies and benchmarking.

This position is open to all candidates.

מידת ההתאמה שלי לתפקיד

עדכון כישורים

התאמה למשרה

התאמתך לתפקיד מחושבת על פי כישורך (כפי שסיפרת לנו עליהם) מול דרישות המעסיק - אין בכך כדי להעיד על קבלתך לעבודה (זה יחליט המעסיק)

למציאת הכשרות רלוונטיות עדכון כישורים

כישורים חסרים

Deep Learning Frameworks (29.4%)got it don't got itDeveloper Tools (29.4%)got it don't got itDistributed Training Frameworks (11.8%)got it don't got itGpu Performance Profiling (5.9%)got it don't got itLlm Inference Engines (11.8%)got it don't got itPythongot it don't got itTransformer Architectures (11.8%)got it don't got it

משרות חדשות במערכת שיכולות לעניין אותך

Senior AI Optimization Engineer for Video Analysis

היברידי

קיסריה

פורסם לפני 3 שבועות

What Youll Do Own the full lifecycle of AI and computer vision systems, from model integration to scalable, production deployment. ...

לוח אפשרויות קריירה - הדרך החכמה להייטק

ML Engineer – AI Infra Group