Sumeet Motwani

I am a Machine Learning PhD student at the University of Oxford, where my research is funded by Eric Schmidt and CAIF. I'm advised by Philip Torr and Christian Schroeder.

My work focuses on RL post-training, multi-agent systems, and AI security. I'm particularly interested in meta-RL, open-endedness, and methods for measuring and improving long-horizon LLM agent capabilities.

During my PhD, I've spent time at Microsoft Research and Google X. At MSR, I was part of AI Frontiers and worked on self-play and RL rewards for open-ended domains. Previously, I was an undergrad at UC Berkeley where I was a member of Berkeley AI Research (advised by Dan Hendrycks) and Cal Boxing. Feel free to get in touch!

Contact

Google Scholar / Twitter / LinkedIn / Email

Selected Papers

Recent Preprints

h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

Sumeet Ramesh Motwani*, A. Ivanova*, Z. Cai, P. Torr, R. Islam, S. Shah, C.S. de Witt, C. London
arXiv preprint, 2025

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

P. Putta, E. Mills, N. Garg, Sumeet Ramesh Motwani, C. Finn, D. Garg, R. Rafailov
arXiv preprint, 2024

Contact

Selected Papers

Recent Preprints

h1: Bootstrapping LLMs to Reason over Longer Horizons via Reinforcement Learning

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

Conference Publications

MALT: Improving Reasoning with Multi-Agent LLM Training

Secret Collusion Among Generative AI Agents

REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits

STARC: A General Framework For Quantifying Differences Between Reward Functions

Foundational challenges in assuring alignment and safety of large language models