Building Stateful MCP Environments
A comprehensive tutorial for creating stateful MCP environments for reinforcement learning
Building Stateful MCP Environments
This tutorial teaches you how to build stateful environments using the Model Context Protocol (MCP). You'll learn how to create environments that maintain state between agent interactions, enabling complex multi-step tasks for reinforcement learning.
What You'll Build
By the end of this tutorial, you'll understand how to:
- Create an MCP server that maintains per-session state
- Build a gym-style environment wrapper that communicates with your MCP server
- Generate training datasets for your custom environment
- Deploy and test your stateful environment locally
Prerequisites
- Basic understanding of reinforcement learning concepts (states, actions, rewards)
- Familiarity with Python and async programming
- Understanding of REST APIs and HTTP concepts
Understanding the Architecture
A stateful MCP environment consists of four key components:
1. The MCP Server (State Owner)
The MCP server owns all gameplay state and exposes tools that agents can call. Unlike stateless APIs, it maintains session-specific data across multiple tool invocations.
Key responsibilities:
- Store per-session state (scores, history, resources, etc.)
- Expose tools for agent actions (submit, query, manipulate state)
- Provide reward calculations and observations
- Manage session lifecycle and cleanup
2. The Environment Wrapper
A minimal gym-style wrapper that translates between the RL harness and your MCP server.
Key responsibilities:
- Initialize with server connection details
- Forward agent actions to MCP tools
- Extract rewards and observations from MCP responses
- Signal episode termination
3. The Dataset Generator
Creates training tasks with initial prompts and state parameters.
Key responsibilities:
- Generate JSONL files with task specifications
- Include server IDs and tool names
- Provide agent instructions and hints
- Create curriculum variations (difficulty levels, parameters)
4. The MCP Client Config
Registers your server so the runtime can discover and connect to it.
Key responsibilities:
- Map server names to connection endpoints
- Specify transport types (HTTP, stdio)
- Configure authentication if needed
Tutorial: Building a Number Guessing Game
Let's build a complete stateful environment step by step. We'll create a game where agents guess numbers within a range and receive feedback.
Step 1: Define Your Session State
First, define what data you need to track per session:
Design tips:
- Keep public vs. private state separate (don't leak the target!)
- Limit history size to prevent memory bloat
- Store both raw data and computed metrics
- Make state serializable for debugging
Step 2: Build the MCP Server
Now create the MCP server that manages sessions and exposes tools:
Key patterns:
- Session management: Use
_extract_session_id()
to get a stable session key from request metadata - Thread safety: Protect shared session dictionary with a lock
- Tool schema: Define clear input schemas so agents know what parameters to send
- Structured responses: Return both human-readable text and machine-parseable JSON
- Lazy initialization: Create sessions on first access with optional parameters
Step 3: Create the Environment Wrapper
Build a minimal gym-style wrapper that connects to your MCP server:
Design principles:
- Minimal wrapper: The wrapper doesn't duplicate state logic—that lives in the MCP server
- Single-step episodes: This example ends after one
step()
call, but you can extend it for multi-step - Error handling: Gracefully handle JSON parsing errors and missing fields
- Info dict: Return debugging details to help diagnose issues
Step 4: Generate Training Data
Create a dataset generator that produces JSONL tasks:
Dataset design tips:
- Curriculum learning: Mix easy and hard tasks for better training
- Explicit hints: Guide agents toward good strategies
- Metadata: Include task IDs and difficulty for analysis
- Variation: Randomize parameters to improve generalization
Step 5: Configure the MCP Client
Register your server in the MCP client configuration:
For stdio transport:
Step 6: Run Your Environment Locally
Start the MCP Server (HTTP mode)
For HTTP deployment, wrap your server with the HTTP session manager:
Run it:
Test with the MCP CLI
Before wiring into the RL harness, test your server:
Run a Training Job
Once everything works:
Advanced Patterns
Multi-Step Episodes
Extend the wrapper to support multiple steps before episode termination:
Shaped Rewards
Return incremental rewards to guide learning:
Async Tool Calls
For concurrent environments, use async MCP calls:
Session Cleanup
Add session cleanup to prevent memory leaks:
Debugging Tips
1. Check Session Persistence
Verify sessions persist across tool calls:
2. Inspect MCP Responses
Log raw MCP responses to understand structure:
3. Validate Session IDs
Ensure the server receives stable session IDs:
4. Monitor Memory Usage
Track session count and history size:
Best Practices
Security
- Validate inputs: Check tool arguments before processing
- Sanitize outputs: Don't leak sensitive state (like target numbers)
- Rate limiting: Add request limits to prevent abuse
- CORS policies: Configure allowed origins for HTTP servers
Performance
- Trim history: Limit stored history to prevent unbounded growth
- Lazy initialization: Create sessions on-demand
- Async operations: Use async/await for I/O-bound operations
- Session cleanup: Remove inactive sessions periodically
Testing
- Unit test state logic: Test
SessionState
methods independently - Integration test tools: Verify tool calls return expected structures
- Load test sessions: Ensure server handles concurrent sessions
- Validate dataset: Check generated tasks are well-formed
Common Pitfalls
1. Forgetting stateless=False
For HTTP servers, you must use:
Without this, sessions won't persist!
2. Not Trimming History
Always limit history size:
3. Leaking Private State
Never include secret state in public observations:
4. Blocking Async Operations
Use async properly:
Next Steps
Now that you understand stateful MCP environments, you can:
- Build complex tasks: Multi-step games, resource management, dialogue
- Add persistence: Save sessions to database for long-running episodes
- Implement multiplayer: Support multi-agent interactions in shared sessions
- Create benchmarks: Design evaluation suites for your environments
Additional Resources
Getting Help
If you encounter issues:
- Check server logs for errors
- Verify MCP client configuration
- Test tools with MCP CLI before integrating
- Review session ID extraction logic
- Contact support with session dumps and error logs