Alexander Hägele

I’m a PhD Student in Machine Learning at EPFL in the MLO group, where I am supervised by Prof. Martin Jaggi, and a core contributor to the Apertus project. I just finished the inaugural Anthropic Fellowship in AI Safety in London, working with Jascha Sohl-Dickstein. In my current research, I am exploring:

Understanding scaling behaviour & methods for scaling language models
Training, optimization and algorithms for large (and small!) models
The interplay between data, optimization, and performance

and more :)

Prior to my PhD, I completed my Master and Bachelor in Computer Science at ETH Zürich working with the LAS group and Prof. Andreas Krause. I also was a visiting Student Researcher at the Apple MLR team in Paris, hosted by Prof. Marco Cuturi.
During my studies, I had the chance to spend two semesters abroad: one at École Polytechnique in France, and one at the University of Toronto in Canada. Moreover, I was able to do an internship at Spacemaker AI, which is fun looking back at the interview I gave here.

For EPFL students: If you are interested in a project (check out our MLO project page for guidelines and directions), please feel free to reach out via mail. I’m very happy to supervise motivated students! Please note, though, that it happens that I receive too many emails and I fail to reply (or take quite some time).

Feel free to get in touch via mail:
alexander.hagele [at] epfl [dot] ch

Publications

Full list also on Google Scholar.

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments.
Alejandro Hernández-Cano, Alexander Hägele, […], Antoine Bosselut, Martin Jaggi, Imanol Schlag.
The biggest fully open and compliant training run and LLM to date.
[PDF | Huggingface | bibtex | Pretrain Code | Pretrain Data | Posttrain Code | Posttrain Data | Evals]

Inverse Scaling in Test-Time Compute.
Aryo Pradipta Gema, Alexander Hägele, Runjin Chen, Andy Arditi, Jacob Goldman-Wetzler, Kit Fraser-Taliente, Henry Sleight, Linda Petrini, Julian Michael, Beatrice Alex, Pasquale Minervini, Yanda Chen, Joe Benton, Ethan Perez.
Preprint.
[PDF | bibtex]

Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler.
Aleksandr Dremov, Alexander Hägele, Atli Kosson, Martin Jaggi.
TMLR, 2025.
[PDF | bibtex | Code]

The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training.
Fabian Schaipp, Alexander Hägele, Adrien Taylor, Umut Simsekli, Francis Bach.
ICML 2025.
[PDF | bibtex | Code]

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations.
Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben Allal, Leandro Von Werra and Martin Jaggi.
NeurIPS 2024. Spotlight Award at NeurIPS’24.
Also Spotlight Presentation at the Workshop on Next Generation of Sequence Models (NGSM) and Best Poster Award at the Workshop on Efficient Systems for Foundation Models (ES-FOMO) at ICML’24.
[PDF | bibtex | Code | Slides]

BaCaDI: Bayesian Causal Discovery with Unknown Interventions.
Alexander Hägele, Jonas Rothfuss, Lars Lorch, Vignesh Ram Somnath, Bernhard Schölkopf and Andreas Krause.
AISTATS 2023.
Also presented at 1st Workshop on Causal Representation Learning at UAI 2022 (Link).
Oral presentation at AISTATS 2023 (notable paper award), ranked top 32 among 1689 submissions (top 1.9%).
[PDF | bibtex | Code | Slides]

Robustness Certification with Generative Models.
Matthew Mirman, Alexander Hägele, Pavol Bielik, Timon Gehr and Martin Vechev.
Proceedings of the 42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2021).
[PDF | bibtex | DOI]

News

September 2025: We finally released the Apertus models and published the technical report.
April 2025: I’m in London for 6 months, working on the Anthropic Fellowship in AI Safety.
December 2024: Gave a talk about our paper at the Stanford ML Lunch Seminar and attended the NeurIPS 2024 conference in Vancouver.
September 2024: Our paper on LR schedules and scaling laws was accepted as a spotlight to NeurIPS 2024 :D This is after a spotlight presentation at the Workshop on Next Generation of Sequence Models (NGSM) and a surpising best poster award at (ES-FOMO) at ICML’24! Thanks to the organizers and the community for the great feedback.
May 2024: Our preprint for LR schedules and scaling laws is out!
April 2024: Had a lot of fun co-organizing a hackathon on LLM pretraining together with LauzHack and colleagues from MLO – congrats to all the winners! Links: Twitter, Linkedin
October 2023: Starting my PhD at EPFL :)
April 2023: I’ve presented BaCaDI as an oral presentation at AISTATS 2023 (notable paper award, 32 / 1689 submissions) in Valencia (link to the conference). Slides are available here.
April 2023: I’m in Paris working with the Apple MLR team for the next months – please reach out if you’re around!
January 2023: For a course project on data visualization, I created a blog post on Visualizing folktables, a benchmark dataset for fairness in ML. The goal of the post is to provide an overview and exploration of the dataset, hopefully being of value for researchers working with folktables.
January 2023: Our paper on BaCaDI: Bayesian Causal Discovery with Unknown Interventions was accepted to AISTATS 2023!
September 2022: I’m in Paris at École Polytechnique (well, Palaiseau) for the next few months for a semester abroad before the end of my Master. Please send me a message if you’re around and want to chat! :)
July 2022: Presenting our work on Bayesian Causal Discovery to the Causality Discussion Group.