RunRL

Overview

The RunRL platform provides a simple yet powerful interface for training LLMs with reinforcement learning on arbitrary prompt and reward files.

Setup

An RL run consists of three main components:

  • a base model, such as Qwen3 4B
  • prompts, the inputs to the model
  • either a reward function or environment class, which score model outputs

Additionally, you can specify:

  • MCP tools for your model to call
  • Hyperparameters such as learning rate multiplier and number of epochs

On this page