I’m a PhD Student in Machine Learning at EPFL in the MLO group, where I am supervised by Prof. Martin Jaggi. I am (like everyone?) intrigued by the power of pretrained models and their applications, in particular through open science and open-source. In my current research, I am therefore exploring:

  • Methods for scaling open language models & scaling behaviour
  • Training, optimization and efficient algorithms for large (and small!) models
  • Adaptive computing and new architectures for deep learning

and more :)

Prior to my PhD, I completed my Master and Bachelor in Computer Science at ETH Zürich working with the LAS group and Prof. Andreas Krause. I also was a visiting Student Researcher at the Apple MLR team in Paris, hosted by Prof. Marco Cuturi.
During my studies, I had the chance to spend two semesters abroad: one at École Polytechnique in France, and one at the University of Toronto in Canada. Moreover, I was able to do an internship at Spacemaker AI, which is fun looking back at the interview I gave here.

For EPFL students: If you are interested in a project (check out our MLO project page for guidelines and directions), please feel free to reach out via mail. I’m very happy to supervise motivated students!

Feel free to get in touch via mail:
alexander.hagele [at] epfl [dot] ch

Publications


Full list also here or on Google Scholar.

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations.
Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben Allal, Leandro Von Werra and Martin Jaggi.
arXiv Preprint 2405.18392, 2024.
Spotlight presentation at the Workshop on Next Generation of Sequence Models (NGSM) at ICML’24.
Also poster at the Workshop on Efficient Systems for Foundation Models (ES-FOMO) at ICML’24.
[PDF | bibtex | Code]

BaCaDI: Bayesian Causal Discovery with Unknown Interventions.
Alexander Hägele, Jonas Rothfuss, Lars Lorch, Vignesh Ram Somnath, Bernhard Schölkopf and Andreas Krause.
Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023).
Also presented at 1st Workshop on Causal Representation Learning at UAI 2022 (Link).
Oral presentation at AISTATS 2023 (notable paper award), ranked top 32 among 1689 submissions (top 1.9%).
[PDF | bibtex | Details | Code | Slides]

Robustness Certification with Generative Models.
Matthew Mirman, Alexander Hägele, Pavol Bielik, Timon Gehr and Martin Vechev.
Proceedings of the 42nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2021).
[PDF | bibtex | DOI]

News

  • May 2024: Our preprint for LR schedules and scaling laws is out!
  • April 2024: Had a lot of fun co-organizing a hackathon on LLM pretraining together with LauzHack and colleagues from MLO – congrats to all the winners! Links: Twitter, Linkedin
  • October 2023: Starting my PhD at EPFL :)
  • April 2023: I’ve presented BaCaDI as an oral presentation at AISTATS 2023 (notable paper award, 32 / 1689 submissions) in Valencia (link to the conference). Slides are available here.
  • April 2023: I’m in Paris working with the Apple MLR team for the next months – please reach out if you’re around!
  • January 2023: For a course project on data visualization, I created a blog post on Visualizing folktables, a benchmark dataset for fairness in ML. The goal of the post is to provide an overview and exploration of the dataset, hopefully being of value for researchers working with folktables.
  • January 2023: Our paper on BaCaDI: Bayesian Causal Discovery with Unknown Interventions was accepted to AISTATS 2023!
  • September 2022: I’m in Paris at École Polytechnique (well, Palaiseau) for the next few months for a semester abroad before the end of my Master. Please send me a message if you’re around and want to chat! :)
  • July 2022: Presenting our work on Bayesian Causal Discovery to the Causality Discussion Group.