publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2024

  1. Under Review
    Secret Collusion Among Generative AI Agents
    Sumeet Ramesh Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip H. S. Torr, Lewis Hammond, and Christian Schroeder Witt
    2024

2023

  1. ICLR 2024
    STARC: A General Framework For Quantifying Differences Between Reward Functions
    J. Skalse, L. Farnik, S. R. Motwani, E. Jenner, A. Gleave, and A. Abate
    The Twelfth International Conference on Learning Representations, Sep 2023
  2. NeurIPS MASEC
    A Perfect Collusion Benchmark: How can AI agents be prevented from colluding with information-theoretic undetectability?
    S. R. Motwani, M. Baranchuk, L. Hammond, and C. S. Witt
    In Multi-Agent Security Workshop, NeurIPS’23, Oct 2023