We are looking for a talented and motivated Software Engineer to join our newly formed team developing orchestration tools and platforms for AI datacenters.
The main goal of this team is to create customer-focused orchestration solutions that simplify the deployment, management, and optimization of large-scale AI workloads across a full datacenter stack including switches, hosts, smart NICs, GPUs, ROCm, and RCCL.
You will work on the design and development of orchestration systems that bridge compute, networking, and AI acceleration domains, primarily using Python and modern full-stack technologies.
Key Responsibilities
* Design and develop software components for orchestration platforms managing AI datacenter infrastructure.
* Implement control and coordination mechanisms for compute, network, and AI acceleration resources.
* Develop backend services, APIs, and UI components using Python and modern full-stack frameworks.
* Collaborate with cross-functional teams including networking, GPU, and system software to integrate orchestration capabilities across multiple layers.
* Participate in architecture discussions, code reviews, and continuous integration processes.
* Contribute to testing, validation, and performance improvements of orchestration systems.
* Engage with product and customer teams to translate operational needs into effective software solutions.
The main goal of this team is to create customer-focused orchestration solutions that simplify the deployment, management, and optimization of large-scale AI workloads across a full datacenter stack including switches, hosts, smart NICs, GPUs, ROCm, and RCCL.
You will work on the design and development of orchestration systems that bridge compute, networking, and AI acceleration domains, primarily using Python and modern full-stack technologies.
Key Responsibilities
* Design and develop software components for orchestration platforms managing AI datacenter infrastructure.
* Implement control and coordination mechanisms for compute, network, and AI acceleration resources.
* Develop backend services, APIs, and UI components using Python and modern full-stack frameworks.
* Collaborate with cross-functional teams including networking, GPU, and system software to integrate orchestration capabilities across multiple layers.
* Participate in architecture discussions, code reviews, and continuous integration processes.
* Contribute to testing, validation, and performance improvements of orchestration systems.
* Engage with product and customer teams to translate operational needs into effective software solutions.
Requirements:
Required Qualifications
3+ years of experience in software development, preferably in infrastructure, orchestration, or systems software.
Strong proficiency in Python, including experience with backend or orchestration frameworks.
Familiarity with datacenter or cloud infrastructure, including networking, compute, or storage systems.
Experience with containers and orchestration platforms (Docker, Kubernetes).
Solid understanding of software engineering principles, including design patterns, testing, and CI/CD.
Strong collaboration and communication skills, with the ability to work in a multidisciplinary environment.
Preferred Qualifications
Exposure to AI workloads and GPU ecosystems (ROCm, RCCL, PyTorch, TensorFlow).
Experience with distributed systems, control-plane software, or cluster management frameworks.
Familiarity with REST/gRPC APIs, microservices, and cloud-native architectures.
Background in monitoring, telemetry, or resource scheduling systems.
Practical experience in full-stack development (React, Angular, Node.js, or equivalent).
Experience with test automation frameworks (pytest, Robot Framework, etc.).
Required Qualifications
3+ years of experience in software development, preferably in infrastructure, orchestration, or systems software.
Strong proficiency in Python, including experience with backend or orchestration frameworks.
Familiarity with datacenter or cloud infrastructure, including networking, compute, or storage systems.
Experience with containers and orchestration platforms (Docker, Kubernetes).
Solid understanding of software engineering principles, including design patterns, testing, and CI/CD.
Strong collaboration and communication skills, with the ability to work in a multidisciplinary environment.
Preferred Qualifications
Exposure to AI workloads and GPU ecosystems (ROCm, RCCL, PyTorch, TensorFlow).
Experience with distributed systems, control-plane software, or cluster management frameworks.
Familiarity with REST/gRPC APIs, microservices, and cloud-native architectures.
Background in monitoring, telemetry, or resource scheduling systems.
Practical experience in full-stack development (React, Angular, Node.js, or equivalent).
Experience with test automation frameworks (pytest, Robot Framework, etc.).
This position is open to all candidates.






















