Examples

Below are small, end-to-end demonstrations showing how to combine prompt datasets, reward functions, and the SDK. The examples intentionally use small datasets so you can reproduce them quickly.

Fruit Preference RL (Beginner)

Train a model to answer fruit-related questions while rewarding banana-centric answers.

from runrl import RunRLClient
import json

client = RunRLClient()

prompts = [
    {
        "prompt": [{"role": "user", "content": "What is your favorite fruit?"}],
        "expected_result": "banana",
    },
    {
        "prompt": [{"role": "user", "content": "Name a red fruit."}],
        "expected_result": "strawberry",
    },
]

with open("fruit_prompts.jsonl", "w", encoding="utf-8") as f:
    for row in prompts:
        f.write(json.dumps(row) + "\n")

prompt_file = client.files.upload_path("fruit_prompts.jsonl", file_type="prompt")

reward_code = """
def reward_fn(completion, **kwargs):
    response = completion[0]["content"].lower()
    expected = kwargs.get("expected_result", "").lower()
    if expected and expected in response:
        return 1.0
    if "banana" in response:
        return 0.8
    return 0.0
"""

reward_file = client.files.create_from_content(
    name="fruit_reward",
    file_type="reward_function",
    content=reward_code,
)

future = client.runs.create(
    model="Qwen/Qwen2.5-3B-Instruct",
    prompt_file_id=prompt_file.id,
    reward_file_id=reward_file.id,
    type="reward_function",
    completion_length=256,
    epochs=1,
)

run = future.result()
print("Run", run.id, "finished with", run.status)

Math Reasoning (Intermediate)

Reward step-by-step answers that include the correct final result inside <answer> tags.

import re
from runrl import RunRLClient

client = RunRLClient()

prompt_file = client.files.upload_path("math_prompts.jsonl", file_type="prompt")

reward_code = """
import re

def reward_fn(completion, **kwargs):
    response = completion[0]["content"]
    expected = (kwargs.get("expected_result") or "").strip()
    match = re.search(r"<answer>(.*?)</answer>", response, re.IGNORECASE)
    if not match:
        return 0.0
    answer = match.group(1).strip()
    if expected and expected in answer:
        return 1.0
    return 0.3
"""

reward_file = client.files.create_from_content(
    name="math_reward",
    file_type="reward_function",
    content=reward_code,
)

future = client.runs.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    prompt_file_id=prompt_file.id,
    reward_file_id=reward_file.id,
    type="reward_function",
    completion_length=512,
    epochs=2,
    learning_rate_multiplier=1.1,
)

run = future.result()
print(run.status)

Customer Support Tone (Advanced)

Use shared configurations to distribute recipes or replicate successful runs.

config = client.shared_configurations.create(
    name="Support Tone",
    visibility="unlisted",
    prompt_file_id="PROMPT_UUID",
    reward_file_id="REWARD_UUID",
    configuration={
        "model": "Qwen/Qwen2.5-3B-Instruct",
        "reward_mode": "reward_function",
        "completion_length": 256,
        "epochs": 2,
    },
)

print("Share via", config.shareable_url)

copy_payload = client.shared_configurations.copy(config.uuid)
print("Copied prompt id", copy_payload["prompt_file_id"])

LoRA Training (Efficient Fine-Tuning)

Train larger models faster with parameter-efficient LoRA fine-tuning.

from runrl import RunRLClient

client = RunRLClient()

# Standard LoRA configuration (recommended)
future = client.runs.create(
    model="Qwen/Qwen2.5-7B-Instruct",
    prompt_file_id="PROMPT_UUID",
    reward_file_id="REWARD_UUID",
    type="reward_function",
    completion_length=1024,
    # Enable LoRA
    use_lora=True,
    lora_rank=32,
    lora_alpha=64,
)

run = future.result()
print(f"LoRA training completed: {run.status}")

LoRA Benefits:

2-3× faster training compared to full fine-tuning
Significantly lower memory usage – train larger models on smaller GPUs
Often better generalization – reduces overfitting

Configuration Tips:

Standard LoRA: lora_rank=32, lora_alpha=64
High-quality LoRA: lora_rank=64, lora_alpha=128
Fast LoRA: lora_rank=16, lora_alpha=32

More comprehensive notebooks live in the repository under runrl-python/examples/.

Examples

Examples

Fruit Preference RL (Beginner)

Math Reasoning (Intermediate)

Customer Support Tone (Advanced)

LoRA Training (Efficient Fine-Tuning)

On this page