Will Cai

Hi, I’m currently a Research Fellow at Anthropic focused on AI safety and security. I’m also finishing my BA/MS at UC Berkeley advised by Dawn Song.

I’d love to chat or collaborate on LLM safety or RL projects. Lately I’ve been thinking about defenses against adversarial distillation and building agents for automated interpretability research.