Hannah Erlebach

DPhil in Machine Learning @ FLAIR, University of Oxford

Oxford · UK · [first].[last]@gmail.com

About

I’m a first-year DPhil student in machine learning supervised by Jakob Foerster and funded by the Cooperative AI PhD fellowship. I'm interested in how diverse minds represent the world and in particular each other's representations - to what extent can communication, concept transfer and mutual understanding arise between heterogeneous or possibly incommensurable representations? I posit that having better models of others' internal representations of the world allows for more effective knowledge transfer, as well as improved empathetic and cooperative capabilities.

Relatedly, I'm interested in the structures of knowledge across different disciplines and cultures and how they are related. I'm also interested in the interplay between natural and artificial intelligence research, and am currently running a reading group on active inference in Oxford.

I did my masters in machine learning at University College London, where my thesis was supervised by Tim Rocktaschel, and my undergraduate in maths at Cambridge.

Research

My previous research has focused on AI safety and cooperation in language models and multi-agent reinforcement learning settings.

  • DUA: Discovering Universal Attacks Using Foundation Models. Master's thesis for UCL MSc in Machine Learning, 2025.
  • Guiding Evolution of Artificial Life Using Vision-Language Models. Nikhil Baid, Hannah Erlebach, Paul Hellegouarch and Frederico Wieser. Published in Artificial Life Conference 2025. [arXiv]
  • Mitigating Goal Misgeneralisation via Minimax Regret. Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger and Michael Dennis. Published in Reinforcement Learning Conference 2025. [arXiv]
  • RACCOON: Regret-based Adaptive Curricula for Cooperation. Hannah Erlebach and Jonathan Cook. Published in CoCoMARL workshop at Reinforcement Learning Conference 2024.
  • Welfare Diplomacy: Benchmarking Language Model Cooperation. Gabriel Mukobi, Hannah Erlebach, Niklas Lauffer, Lewis Hammond, Alan Chan and Jesse Clifton. Published in SoLaR workshop at NeurIPS 2023. [arXiv]