our company has been at the forefront of the deep learning revolution, pioneering innovations that have transformed the entire field. As the leading provider of GPUs and AI computing platforms, our cpmpany has empowered researchers and engineers worldwide to accelerate breakthroughs in artificial intelligence.
We seek a versatile Senior Software Engineer who is passionate about performance optimization and generative AI. Our team builds software solutions that enable efficient inference on the latest and greatest generative AI models. We tackle problems on all levels of the stackfrom server-level request batching to GPU kernel fusionand collaborate with teams across diverse disciplines to push our company's hardware to its full potential.
What youll be doing:
Cooperate with research teams to onboard new LLMs and VLMs into our company's opensource AI runtimes
Optimize inference workloads using sophisticated profiling and simulation tools
Build SOLID, extendable inference software systems, and refine robust APIs
Implement and debug low-level GPU code to harness the latest HW features
Own end-to-end inference acceleration features and work with teams around the world to deliver production-grade products.
We seek a versatile Senior Software Engineer who is passionate about performance optimization and generative AI. Our team builds software solutions that enable efficient inference on the latest and greatest generative AI models. We tackle problems on all levels of the stackfrom server-level request batching to GPU kernel fusionand collaborate with teams across diverse disciplines to push our company's hardware to its full potential.
What youll be doing:
Cooperate with research teams to onboard new LLMs and VLMs into our company's opensource AI runtimes
Optimize inference workloads using sophisticated profiling and simulation tools
Build SOLID, extendable inference software systems, and refine robust APIs
Implement and debug low-level GPU code to harness the latest HW features
Own end-to-end inference acceleration features and work with teams around the world to deliver production-grade products.
Requirements:
B.Sc., M.Sc. or equivalent experience in Computer Science or Computer Engineering
5+ years of relevant hands-on software engineering experience
Profound knowledge of software design principles
Strong proficiency in at least one system and one scripting language
Strong grasp of machine learning concepts
People person with excellent communication skills that enjoys collaboration and teamwork.
Ways to stand out from the crowd:
Familiarity with our company's DL software stack, e.g. Triton Inference Server, TensorRT-LLM, and Model Optimizer
Proven track record of performance modeling, profiling, debugging, and development in a performance-critical setting with our company's accelerators.
Familiarity with LLM quantization, fine-tunning, and caching algorithms
Proficiency in GPU kernel programming (CUDA or OpenCL)
Prior experience working on a large software project with 50+ contributors.
B.Sc., M.Sc. or equivalent experience in Computer Science or Computer Engineering
5+ years of relevant hands-on software engineering experience
Profound knowledge of software design principles
Strong proficiency in at least one system and one scripting language
Strong grasp of machine learning concepts
People person with excellent communication skills that enjoys collaboration and teamwork.
Ways to stand out from the crowd:
Familiarity with our company's DL software stack, e.g. Triton Inference Server, TensorRT-LLM, and Model Optimizer
Proven track record of performance modeling, profiling, debugging, and development in a performance-critical setting with our company's accelerators.
Familiarity with LLM quantization, fine-tunning, and caching algorithms
Proficiency in GPU kernel programming (CUDA or OpenCL)
Prior experience working on a large software project with 50+ contributors.
This position is open to all candidates.