Advanced MCP: Streaming and Approval Gates
Users would ask Claude to "set up all my integrations," and Claude would call our provisioning tool. Then nothing. Users waited 45 seconds staring at a spinner while our server created API keys, configured webhooks, and set up OAuth clients. Most users gave up after 15 seconds, thinking it failed.
Then someone asked Claude to "clean up old bundles," and Claude dutifully deleted everything from the last 6 months. Because we let it.
We needed streaming for long-running work, chunking for large operations, and approval gates for anything scary. Here's what works.
The Problem: Tools That Take Too Long
Our "provision bundle" tool took 30-45 seconds. It had to:
- Create API credentials (5-8s)
- Set up webhooks (8-12s)
- Configure OAuth (10-15s)
- Validate everything (5-10s)
Users saw nothing for 45 seconds. They'd refresh the page, creating duplicate requests. Our database filled with half-configured bundles because users killed the process mid-flight.
Simple loading spinners didn't help—users needed to see progress or they assumed it broke.
Streaming: Show Progress, Keep Users Engaged
Streaming lets you send updates during execution. Instead of waiting 45 seconds for a final response, send status updates every few seconds.
Basic streaming example:
@mcp.tool(description="Provision a new bundle with progress updates")
async def provision_bundle(bundle_id: str) -> dict:
yield {"status": "starting", "message": "Creating API credentials..."}
await asyncio.sleep(5)
yield {"status": "progress", "message": "Setting up webhooks...", "percent": 33}
await asyncio.sleep(8)
yield {"status": "progress", "message": "Configuring OAuth...", "percent": 66}
await asyncio.sleep(10)
yield {"status": "progress", "message": "Validating configuration...", "percent": 90}
await asyncio.sleep(5)
yield {"status": "complete", "message": "Bundle provisioned successfully", "bundle_id": bundle_id}
Now users see:
- "Creating API credentials..." (immediately)
- "Setting up webhooks... 33%" (after 5s)
- "Configuring OAuth... 66%" (after 13s)
- "Validating configuration... 90%" (after 23s)
- "Bundle provisioned successfully" (after 28s)
What changed:
- Abandonment dropped from 40% to under 5%
- Support tickets about "broken provisioning" went to zero
- Users waited longer because they saw progress
Streaming tips:
Send updates every 2-5 seconds, not every 100ms. Too many updates slow down the UI.
Include percent complete when you can estimate it:
yield {"status": "progress", "percent": 33, "message": "Step 2 of 4"}
Send actual status, not generic messages:
# ✗ Vague
yield {"status": "working", "message": "Processing..."}
# ✓ Specific
yield {"status": "creating_webhooks", "message": "Registering 3 webhook endpoints"}
Chunking: Break Large Operations Into Pieces
We had a tool to "sync all provider credentials" that would check 200+ OAuth tokens. It took 90 seconds and timed out half the time. If it failed at token #195, it started over from token #1.
We chunked it:
@mcp.tool(description="Sync provider credentials in batches")
async def sync_credentials(
cursor: str | None = None,
batch_size: int = 20
) -> dict:
# Get the starting point
start_index = int(cursor) if cursor else 0
# Process one batch
tokens = get_tokens(start_index, batch_size)
results = []
for token in tokens:
result = await refresh_token(token)
results.append(result)
# Calculate next cursor
next_index = start_index + len(tokens)
has_more = next_index < total_token_count()
next_cursor = str(next_index) if has_more else None
return {
"results": results,
"processed": len(results),
"next_cursor": next_cursor,
"has_more": has_more
}
Now Claude calls it like this:
sync_credentials()
→ returnsnext_cursor="20"
sync_credentials(cursor="20")
→ returnsnext_cursor="40"
- Continues until
has_more=false
What this fixed:
- Timeouts disappeared. Each chunk takes 5-8 seconds instead of 90.
- Failures are cheap. If chunk #7 fails, retry chunk #7, not all 200 tokens.
- Progress is visible. Claude tells users "Synced 60 of 200 credentials..."
Chunking tips:
Make chunks idempotent. If Claude calls the same cursor twice, it should be safe:
# Store processed IDs to avoid duplicates
processed = set(get_processed_ids())
tokens_to_process = [t for t in tokens if t.id not in processed]
Keep chunk size reasonable—20-50 items is usually right. Too small means lots of calls, too large risks timeouts.
Return clear completion signals:
return {
"results": [...],
"next_cursor": "page_3" if has_more else None,
"progress": {"completed": 60, "total": 200, "percent": 30}
}
Store progress server-side so users can resume after closing the tab:
# Save state
await db.save_sync_state(user_id, {"last_cursor": next_cursor})
# Resume later
last_state = await db.get_sync_state(user_id)
cursor = last_state.get("last_cursor") if last_state else None
Human-in-the-Loop: Don't Let Claude Delete Everything
Someone asked Claude to "clean up my old bundles." Claude called delete_bundle
for every bundle older than 6 months. Gone. All of them. Including production ones the user needed.
We added approval gates:
@mcp.tool(description="Delete a bundle (requires approval)")
async def delete_bundle(bundle_id: str) -> dict:
# Get bundle details
bundle = await get_bundle(bundle_id)
# Request approval
approved = await request_approval({
"action": "delete_bundle",
"bundle_name": bundle.name,
"bundle_id": bundle_id,
"warning": "This action cannot be undone",
"details": {
"created": bundle.created_at,
"tools_count": len(bundle.tools),
"last_used": bundle.last_used_at
}
})
if not approved:
return {"status": "cancelled", "message": "User declined deletion"}
# User approved, proceed
await actually_delete_bundle(bundle_id)
return {"status": "deleted", "bundle_id": bundle_id}
Now when Claude wants to delete something, users see:
Claude wants to delete bundle "Production Slack Integration"
Details:
- Created: 2024-03-15
- Tools: 12
- Last used: 2 hours ago
WARNING: This action cannot be undone
[Cancel] [Approve]
What we require approval for:
- Deleting anything
- Revoking credentials or API keys
- Changing production settings
- Spending money (upgrading plans, purchasing add-ons)
- Sharing data externally
Approval tips:
Show a clear diff or summary:
approval_payload = {
"action": "update_webhook_url",
"changes": {
"old_url": "https://old.example.com/webhook",
"new_url": "https://new.example.com/webhook"
},
"impact": "Webhooks will be sent to the new URL immediately"
}
Make the default safe. If approval fails or times out, cancel the operation.
Log approvals for audit:
if approved:
await log_audit_event({
"user_id": user_id,
"action": "delete_bundle",
"bundle_id": bundle_id,
"approved_at": datetime.now(),
"approved_by": approval_response.user_id
})
Don't ask for approval on read-only operations. Only destructive or expensive ones.
Combining Patterns: Streaming + Chunking + Approval
For really complex operations, combine all three:
@mcp.tool(description="Migrate all bundles to new infrastructure")
async def migrate_bundles() -> dict:
bundles = await get_all_bundles()
# Request upfront approval
approved = await request_approval({
"action": "migrate_all_bundles",
"count": len(bundles),
"warning": "This will temporarily disable bundles during migration",
"estimated_time": f"{len(bundles) * 5} seconds"
})
if not approved:
return {"status": "cancelled"}
# Stream progress while chunking
for i in range(0, len(bundles), 10):
chunk = bundles[i:i+10]
yield {
"status": "migrating",
"message": f"Migrating bundles {i+1} to {i+len(chunk)}",
"percent": int((i / len(bundles)) * 100)
}
for bundle in chunk:
await migrate_single_bundle(bundle)
await asyncio.sleep(1) # Rate limiting
yield {"status": "complete", "message": f"Migrated {len(bundles)} bundles"}
Users see:
- Approval request with details
- Progress updates every 10 bundles
- Final completion message
When Each Pattern Matters
Use streaming when:
- Operations take > 5 seconds
- You can show meaningful progress
- Users need reassurance it's working
Use chunking when:
- Processing > 50 items
- Individual items can fail independently
- Operations might timeout
- You need resumability
Use approval gates when:
- Actions are destructive or irreversible
- Operations cost money
- Changes affect production systems
- Multiple items will be modified
Don't overuse them:
- Streaming a 500ms operation is overkill
- Chunking 5 items is unnecessary
- Approval for reading data is annoying
Key Takeaways
- Users give up after 15 seconds without feedback—stream progress for long operations
- Streaming every 2-5 seconds is ideal—too frequent slows down the UI
- Chunk operations over 50 items and return cursors for resuming
- Make chunks idempotent so retries are safe
- Require approval for destructive operations—show clear diffs and warnings
- Log all approvals for audit trails
- Combine patterns for complex operations: approve upfront, chunk the work, stream progress
- Don't overuse patterns—simple operations should stay simple
Resources
- Anthropic MCP Docs – Streaming and session management
- FastMCP GitHub – Streaming examples and utilities
- MCP Specification – Progress notification spec
Long-running tools need progress updates. Dangerous tools need approval gates. Large operations need chunking. Use them wisely.