We are seeking an experienced engineer to work on distributed AI/ML systems. This role involves working on collective operations – the fundamental operations that enable AI to scale across multiple accelerators & servers. Most of our stack is C/C++ and relatively low level, so solid knowledge of Linux, kernels, and performant code is important. Experience with embedded systems is valued, and experience with high-speed networking or HPC interconnects is valued highly.
If you like solving hard problems, want to work with HPC and ML customers, iterate fast and deliver meaningful solutions at scale, then come join us! This truly is a role on the forefront of AI/ML, youll be working on features for the largest clusters, with the largest customers, for the largest AI models.
Requirements:
Basic Qualifications:
– 5+ years of non-internship professional software development experience.
– 5+ years of programming with at least one software programming language experience.
– 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience.
– 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience.
– Experience as a mentor, tech lead or leading an engineering team.
Basic Qualifications:
– 5+ years of non-internship professional software development experience.
– 5+ years of programming with at least one software programming language experience.
– 5+ years of leading design or architecture (design patterns, reliability and scaling) of new and existing systems experience.
– 5+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience.
– Experience as a mentor, tech lead or leading an engineering team.
Preferred Qualifications:
– Master's degree in computer science or equivalent.
This position is open to all candidates.