File Formats
Detailed guide on how to format prompt files and reward functions for RunRL
File Formats
This guide explains how to properly format prompt files and reward functions for use with RunRL.
Prompt Files
Prompt files contain the examples that your model will learn from during training. They define the inputs that will be sent to the model and the expected outputs.
Format Requirements
Prompt files must be in JSONL (JSON Lines) format, with each line containing a complete, valid JSON object. The file extension should be .jsonl
.
Required Structure
Each line in your prompt file must include:
prompt
: An array of message objects, where each message has:role
: The role of the speaker (e.g., "system", "user")content
: The actual message text
Additional Fields
You can include additional fields that your reward function might need, such as expected_result
for evaluating correctness.
Example Prompt File
Important Notes
- Each line must be a complete, valid JSON object
- No trailing commas are allowed
- The file should contain a wide distribution of prompts that your model may potentially encounter
- Generally, aim to have at least 100 distinct examples, but this depends on your specific task
Reward Functions
Reward functions evaluate the model's responses and provide feedback signals that guide the learning process.
Format Requirements
Reward functions must be Python scripts (.py
files) that define a specific function called reward_func
.
Function Signature
Parameters
prompts
: A list of the prompts that were sent to the model. Each item corresponds to a prompt object from your JSONL file.completions
: A list of completions from the model. Each completion is typically a list itself, where the first element contains the model's response. Access the generated text usingcompletion[0]['content']
for each completion.**kwargs
: Additional data from your prompt JSONL file entries (likeexpected_result
) can be accessed usingkwargs.get('your_key_name')
.
Return Value
The function must return a list[float]
, where each float is the reward score for the corresponding completion.
Example Reward Function
This example reward function evaluates math problems, rewarding correct answers and penalizing the use of numerals in the thinking process:
Best Practices
- Document your reward criteria clearly
- Handle missing or invalid inputs gracefully
- Use helper functions for complex logic
- Consider both positive rewards for desired behaviors and penalties for undesired behaviors
- Test your reward function with sample completions before using it for training