Python SDK
Examples
Practical workflows built with the RunRL Python SDK.
Examples
Below are small, end-to-end demonstrations showing how to combine prompt datasets, reward functions, and the SDK. The examples intentionally use small datasets so you can reproduce them quickly.
Fruit Preference RL (Beginner)
Train a model to answer fruit-related questions while rewarding banana-centric answers.
from runrl import RunRLClient
import json
client = RunRLClient()
prompts = [
{
"prompt": [{"role": "user", "content": "What is your favorite fruit?"}],
"expected_result": "banana",
},
{
"prompt": [{"role": "user", "content": "Name a red fruit."}],
"expected_result": "strawberry",
},
]
with open("fruit_prompts.jsonl", "w", encoding="utf-8") as f:
for row in prompts:
f.write(json.dumps(row) + "\n")
prompt_file = client.files.upload_path("fruit_prompts.jsonl", file_type="prompt")
reward_code = """
def reward_fn(completion, **kwargs):
response = completion[0]["content"].lower()
expected = kwargs.get("expected_result", "").lower()
if expected and expected in response:
return 1.0
if "banana" in response:
return 0.8
return 0.0
"""
reward_file = client.files.create_from_content(
name="fruit_reward",
file_type="reward_function",
content=reward_code,
)
future = client.runs.create(
model="Qwen/Qwen2.5-3B-Instruct",
prompt_file_id=prompt_file.id,
reward_file_id=reward_file.id,
type="reward_function",
completion_length=256,
epochs=1,
)
run = future.result()
print("Run", run.id, "finished with", run.status)Math Reasoning (Intermediate)
Reward step-by-step answers that include the correct final result inside <answer> tags.
import re
from runrl import RunRLClient
client = RunRLClient()
prompt_file = client.files.upload_path("math_prompts.jsonl", file_type="prompt")
reward_code = """
import re
def reward_fn(completion, **kwargs):
response = completion[0]["content"]
expected = (kwargs.get("expected_result") or "").strip()
match = re.search(r"<answer>(.*?)</answer>", response, re.IGNORECASE)
if not match:
return 0.0
answer = match.group(1).strip()
if expected and expected in answer:
return 1.0
return 0.3
"""
reward_file = client.files.create_from_content(
name="math_reward",
file_type="reward_function",
content=reward_code,
)
future = client.runs.create(
model="Qwen/Qwen2.5-7B-Instruct",
prompt_file_id=prompt_file.id,
reward_file_id=reward_file.id,
type="reward_function",
completion_length=512,
epochs=2,
learning_rate_multiplier=1.1,
)
run = future.result()
print(run.status)Customer Support Tone (Advanced)
Use shared configurations to distribute recipes or replicate successful runs.
config = client.shared_configurations.create(
name="Support Tone",
visibility="unlisted",
prompt_file_id="PROMPT_UUID",
reward_file_id="REWARD_UUID",
configuration={
"model": "Qwen/Qwen2.5-3B-Instruct",
"reward_mode": "reward_function",
"completion_length": 256,
"epochs": 2,
},
)
print("Share via", config.shareable_url)
copy_payload = client.shared_configurations.copy(config.uuid)
print("Copied prompt id", copy_payload["prompt_file_id"])LoRA Training (Efficient Fine-Tuning)
Train larger models faster with parameter-efficient LoRA fine-tuning.
from runrl import RunRLClient
client = RunRLClient()
# Standard LoRA configuration (recommended)
future = client.runs.create(
model="Qwen/Qwen2.5-7B-Instruct",
prompt_file_id="PROMPT_UUID",
reward_file_id="REWARD_UUID",
type="reward_function",
completion_length=1024,
# Enable LoRA
use_lora=True,
lora_rank=32,
lora_alpha=64,
)
run = future.result()
print(f"LoRA training completed: {run.status}")LoRA Benefits:
- 2-3× faster training compared to full fine-tuning
- Significantly lower memory usage – train larger models on smaller GPUs
- Often better generalization – reduces overfitting
Configuration Tips:
- Standard LoRA:
lora_rank=32, lora_alpha=64 - High-quality LoRA:
lora_rank=64, lora_alpha=128 - Fast LoRA:
lora_rank=16, lora_alpha=32
More comprehensive notebooks live in the repository under runrl-python/examples/.