Location: Ramat Gan, Israel (hybrid model)
What will you do?
Own & Evolve Benchmarking – Design, build, and maintain Immunais benchmarking suite for foundation models and multimodal AI systems.
Define Core Abstractions – Create clean, extensible abstractions and APIs for datasets, tasks, models, metrics, and evaluation workflows.
Develop Metrics & Evaluations – Implement metrics that capture predictive performance, biological relevance, and multimodal alignment.
Support Model Development – Work closely with AI scientists and data scientists to integrate new models, tweak architectures, and enable rapid, fair iteration.
Bring in New Models & Baselines – Add external and internal models to benchmarks and ensure meaningful comparisons.
Explore Data When Needed – Dive into data and results to debug evaluations, understand model behavior, and unblock modeling work.
Enable Rigor & Reproducibility – Ensure evaluations are consistent, well-versioned, and trustworthy over time.
BSc, MSc, or PhD in Computer Science, Software Engineering, or a related field
Strong software engineering skills with experience designing maintainable, modular systems
Hands-on experience working with ML models and evaluation pipelines
Proficiency in Python and modern ML ecosystems
Ability to read, modify, and debug deep learning models
Experience with benchmarks, metrics, or evaluation frameworks – preferred
Familiarity with foundation models or multimodal learning – preferred
Comfort navigating complex datasets and doing targeted exploratory analysis
Experience in biomedical or other data-intensive domains – a plus








