Required Large Scale Training Engineer – LTX Model
About the Role
As a Large Scale Training Engineer, you will play a key role in enhancing the training throughput of our internal framework and enabling researchers to pioneer new model concepts. This role demands excellent engineering skills for designing, implementing, and optimizing cutting-edge AI models, alongside writing robust machine learning code and understanding supercomputer performance deeply. Your expertise in performance optimization, understanding distributed systems, and bug elimination will be crucial, as our framework supports extensive computations across numerous virtual machines.
This role is designed for individuals who are not only technically proficient but also deeply passionate about pushing the boundaries of AI and machine learning through innovative engineering and collaborative research.
Key Responsibilities
Profile and optimize the training process to ensure efficiency and effectiveness, including optimizing multimodal data pipelines and data storage methods.
Develop high-performance TPU/GPU/CPU kernels and integrate advanced techniques into our training framework to maximize hardware efficiency.
Utilize knowledge of hardware features to make aggressive optimizations and advise on hardware/software co-designs.
Collaboratively develop model architectures with researchers that facilitate efficient training and inference.
Design, maintain, and evolve a high-quality, shared codebase that emphasizes correctness, readability, extensibility, testing, and long-term maintainability, while balancing performance requirements.
About the Role
As a Large Scale Training Engineer, you will play a key role in enhancing the training throughput of our internal framework and enabling researchers to pioneer new model concepts. This role demands excellent engineering skills for designing, implementing, and optimizing cutting-edge AI models, alongside writing robust machine learning code and understanding supercomputer performance deeply. Your expertise in performance optimization, understanding distributed systems, and bug elimination will be crucial, as our framework supports extensive computations across numerous virtual machines.
This role is designed for individuals who are not only technically proficient but also deeply passionate about pushing the boundaries of AI and machine learning through innovative engineering and collaborative research.
Key Responsibilities
Profile and optimize the training process to ensure efficiency and effectiveness, including optimizing multimodal data pipelines and data storage methods.
Develop high-performance TPU/GPU/CPU kernels and integrate advanced techniques into our training framework to maximize hardware efficiency.
Utilize knowledge of hardware features to make aggressive optimizations and advise on hardware/software co-designs.
Collaboratively develop model architectures with researchers that facilitate efficient training and inference.
Design, maintain, and evolve a high-quality, shared codebase that emphasizes correctness, readability, extensibility, testing, and long-term maintainability, while balancing performance requirements.
Requirements:
Industry experience with small to large-scale ML experiments and multi-modal ML pipelines.
Strong software engineering skills, proficient in Python, and experienced with modern C++.
Deep understanding of GPU, CPU, TPU, or other AI accelerator architectures.
Enjoy diving deep into system implementations to improve performance without compromising code quality and maintainability.
Passion for driving ML large-scale training workloads efficiently and optimizing compute kernels.
You are encouraged to apply if you meet 3 out of the 5 core qualifications above and are motivated to grow in the remaining areas.
Nice to have
Background in JAX/Pallas, Triton, CUDA, OpenCL, or similar technologies.
Familiarity with Kubernetes-based environments for running and scaling large-scale workloads.
Industry experience with small to large-scale ML experiments and multi-modal ML pipelines.
Strong software engineering skills, proficient in Python, and experienced with modern C++.
Deep understanding of GPU, CPU, TPU, or other AI accelerator architectures.
Enjoy diving deep into system implementations to improve performance without compromising code quality and maintainability.
Passion for driving ML large-scale training workloads efficiently and optimizing compute kernels.
You are encouraged to apply if you meet 3 out of the 5 core qualifications above and are motivated to grow in the remaining areas.
Nice to have
Background in JAX/Pallas, Triton, CUDA, OpenCL, or similar technologies.
Familiarity with Kubernetes-based environments for running and scaling large-scale workloads.
This position is open to all candidates.

















