Quick Start
Quickest Start
You can load this entire example in one click by loading this configuration!
Quick Start
Total cost: ~$25
Let's train a model to generate a funny joke with RL!
The first thing to do is to go to the create training run page.
First, select the "reward function" option:
Next, select "Qwen3 4B" as the base model:
It's small enough to train quickly but versatile and intelligent enough to be well-suited for RL!
Now, we need to create a prompt file. A prompt file is a set of JSON object lines, which takes on the following format, e.g.:
The most important parts of a prompt file is that each line is a JSON, and each JSON should contain a prompt key, which should point to a list of messages in the format {"role": "user", "content": "text"}
.
In this example, the prompt can always just be constant, so our JSONL will look something like
It's good practice to have the prompts file be at least 50 lines for batch efficiency reasons, so let's make that file slightly longer. If you don't want to do it yourself, you can download it here.
Then, upload it to the prompts file section:
For the reward function, paste the following reward:
(You'll need an OpenAI API key for this.)
Now, press "run"! In about 20 minutes, you'll get a model that's ever-so-slightly funnier than before:
Of course, you could run it for more epochs to make it even funnier! You can also deploy your model for some free slow inference, or email us if you'd like to download your weights.
Happy training!