RL

Posts about reinforcement learning.

Diversity as the bottleneck in Self-Play (06 May 2026) technical rl nlp

Exploring plateaus in prior self-play setups.
Introduction to Policy Gradient for LMs (09 Feb 2026) technical rl nlp

A basic introduction to policy gradient for language models.
Results Replicating L1 for Tulu (02 Feb 2026) technical rl nlp

Results replicating the recent L1 paper.
Reinforcement Learning with Pokemon (02 Aug 2021) technical rl games

Looking into using reinforcement learning to train Pokemon battle bots.