Overview
The RunRL platform provides a simple yet powerful interface for training LLMs with reinforcement learning on arbitrary prompt and reward files.
Setup
An RL run consists of three main components:
- a base model, such as Qwen3 4B
- prompts, the inputs to the model
- either a reward function or environment class, which score model outputs
Additionally, you can specify:
- MCP tools for your model to call
- Hyperparameters such as learning rate multiplier and number of epochs