Python SDK
Examples
Practical workflows built with the RunRL Python SDK.
Examples
Below are small, end-to-end demonstrations showing how to combine prompt datasets, reward functions, and the SDK. The examples intentionally use small datasets so you can reproduce them quickly.
Fruit Preference RL (Beginner)
Train a model to answer fruit-related questions while rewarding banana-centric answers.
Math Reasoning (Intermediate)
Reward step-by-step answers that include the correct final result inside <answer>
tags.
Customer Support Tone (Advanced)
Use shared configurations to distribute recipes or replicate successful runs.
LoRA Training (Efficient Fine-Tuning)
Train larger models faster with parameter-efficient LoRA fine-tuning.
LoRA Benefits:
- 2-3× faster training compared to full fine-tuning
- Significantly lower memory usage – train larger models on smaller GPUs
- Often better generalization – reduces overfitting
Configuration Tips:
- Standard LoRA:
lora_rank=32, lora_alpha=64
- High-quality LoRA:
lora_rank=64, lora_alpha=128
- Fast LoRA:
lora_rank=16, lora_alpha=32
More comprehensive notebooks live in the repository under runrl-python/examples/
.