publications

Selected publications (with papers at NeurIPS, TMLR, ICLR, and other venues)

2024

  1. Preprint
    MALT: Improving Reasoning with Multi-Agent LLM Training
    Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Markian Rybchuk, Philip H. S. Torr, Ivan Laptev, Fabio Pizzati, and 2 more authors
    arXiv preprint, Dec 2024
  2. NeurIPS 2024
    Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
    Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, and Christian Schroeder Witt
    Thirty-Eighth Conference on Neural Information Processing Systems, Feb 2024
  3. Preprint
    Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
    Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg, and Rafael Rafailov
    arXiv preprint, Aug 2024
  4. NeurIPS 2024
    Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
    Andis Draguns, Andrew Gritsevskiy, Sumeet Ramesh Motwani, Charlie Rogers-Smith, Jeffrey Ladish, and Christian Schroeder Witt
    Thirty-Eighth Conference on Neural Information Processing Systems, Jun 2024
  5. TMLR
    Foundational Challenges in Assuring Alignment and Safety of Large Language Models
    Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, and 35 more authors
    Transactions on Machine Learning Research, Apr 2024

2023

  1. ICLR 2024
    STARC: A General Framework For Quantifying Differences Between Reward Functions
    J. Skalse, L. Farnik, Sumeet Ramesh Motwani, E. Jenner, A. Gleave, and A. Abate
    The Twelfth International Conference on Learning Representations, Sep 2023
  2. NeurIPS MASEC
    A Perfect Collusion Benchmark: How can AI agents be prevented from colluding with information-theoretic undetectability?
    Sumeet Ramesh Motwani, M. Baranchuk, L. Hammond, and C. S. Witt
    In Multi-Agent Security Workshop, NeurIPS’23, Oct 2023