Streaming Responses
Overview
Section titled “Overview”TokenRouter supports streaming responses using Server-Sent Events (SSE), allowing you to receive text as it’s generated rather than waiting for the complete response. This provides a better user experience for long-form content generation.
How Streaming Works
Section titled “How Streaming Works”When streaming is enabled:
- The request is sent with
stream: true - Server opens an SSE connection
- Events are sent as they’re generated:
metadata- Provider and model informationmessage.start- Response beginscontent.delta- Text chunks (multiple)usage- Token consumption (optional)message.end- Response completedone- Stream terminator
Basic Streaming
Section titled “Basic Streaming”import Tokenrouter from 'tokenrouter';
const client = new Tokenrouter({ apiKey: process.env.TOKENROUTER_API_KEY});
const stream = await client.responses.create({ model: 'auto:balance', input: 'Write a short story about a robot', stream: true});
for await (const chunk of stream) { if (chunk.event === 'content.delta') { process.stdout.write(chunk.delta.text); }}from tokenrouter import Tokenrouter
client = Tokenrouter( api_key=os.getenv("TOKENROUTER_API_KEY"))
stream = client.responses.create( model="auto:balance", input="Write a short story about a robot", stream=True)
for chunk in stream: if chunk.event == 'content.delta': print(chunk.delta.text, end='', flush=True)curl https://api.tokenrouter.io/v1/responses \ -H "Content-Type: application/json" \ -H "Authorization: Bearer tr_..." \ -d '{ "model": "auto:balance", "input": "Write a short story about a robot", "stream": true }'Event Types
Section titled “Event Types”metadata Event
Section titled “metadata Event”Sent first with routing information:
event: metadatadata: {"provider":"openai","model":"gpt-4o-2024-11-20","routing_mode":"balance"}message.start Event
Section titled “message.start Event”Marks the beginning of the response:
event: message.startdata: {"type":"message","role":"assistant"}content.delta Event
Section titled “content.delta Event”Contains text chunks as they’re generated (sent multiple times):
event: content.deltadata: {"delta":{"text":"Once"}}
event: content.deltadata: {"delta":{"text":" upon"}}
event: content.deltadata: {"delta":{"text":" a"}}
event: content.deltadata: {"delta":{"text":" time"}}usage Event (Optional)
Section titled “usage Event (Optional)”Token consumption statistics:
event: usagedata: {"input_tokens":15,"output_tokens":128,"total_tokens":143}message.end Event
Section titled “message.end Event”Marks completion of the message:
event: message.enddata: {"type":"message_end"}done Event
Section titled “done Event”Stream terminator - always the last event:
event: donedata: nullComplete Stream Example
Section titled “Complete Stream Example”const stream = await client.responses.create({ model: 'auto:balance', input: 'Explain quantum computing', stream: true});
let fullText = '';let metadata = null;let usage = null;
for await (const chunk of stream) { switch (chunk.event) { case 'metadata': metadata = chunk.metadata; console.log(`Using ${metadata.provider} (${metadata.model})`); break;
case 'message.start': console.log('Response starting...\n'); break;
case 'content.delta': const text = chunk.delta.text; fullText += text; process.stdout.write(text); break;
case 'usage': usage = chunk.usage; break;
case 'message.end': console.log('\n\nResponse complete'); break;
case 'done': console.log(`\nTokens used: ${usage?.total_tokens || 'unknown'}`); break; }}
console.log(`\nFull response length: ${fullText.length} characters`);stream = client.responses.create( model="auto:balance", input="Explain quantum computing", stream=True)
full_text = ""metadata = Noneusage = None
for chunk in stream: if chunk.event == "metadata": metadata = chunk.metadata print(f"Using {metadata['provider']} ({metadata['model']})")
elif chunk.event == "message.start": print("Response starting...\n")
elif chunk.event == "content.delta": text = chunk.delta.text full_text += text print(text, end='', flush=True)
elif chunk.event == "usage": usage = chunk.usage
elif chunk.event == "message.end": print("\n\nResponse complete")
elif chunk.event == "done": if usage: print(f"\nTokens used: {usage['total_tokens']}")
print(f"\nFull response length: {len(full_text)} characters")Error Handling
Section titled “Error Handling”Errors during streaming are sent as error events:
try { const stream = await client.responses.create({ model: 'auto:balance', input: 'My SSN is 123-45-6789', // May be blocked by firewall stream: true });
for await (const chunk of stream) { if (chunk.event === 'error') { console.error('Stream error:', chunk.error.message); break; }
if (chunk.event === 'content.delta') { process.stdout.write(chunk.delta.text); } }} catch (error) { console.error('Failed to start stream:', error.message);}try: stream = client.responses.create( model="auto:balance", input="My SSN is 123-45-6789", # May be blocked stream=True )
for chunk in stream: if chunk.event == "error": print(f"Stream error: {chunk.error.message}") break
if chunk.event == "content.delta": print(chunk.delta.text, end='', flush=True)
except Exception as error: print(f"Failed to start stream: {error}")Advanced Usage
Section titled “Advanced Usage”Buffer Management
Section titled “Buffer Management”For UI updates, buffer text for smoother rendering:
let buffer = '';let lastUpdate = Date.now();const UPDATE_INTERVAL = 50; // ms
for await (const chunk of stream) { if (chunk.event === 'content.delta') { buffer += chunk.delta.text;
// Update UI every 50ms if (Date.now() - lastUpdate > UPDATE_INTERVAL) { updateUI(buffer); buffer = ''; lastUpdate = Date.now(); } }
if (chunk.event === 'done' && buffer) { updateUI(buffer); // Flush remaining buffer }}import time
buffer = ""last_update = time.time()UPDATE_INTERVAL = 0.05 # seconds
for chunk in stream: if chunk.event == "content.delta": buffer += chunk.delta.text
# Update UI every 50ms if time.time() - last_update > UPDATE_INTERVAL: update_ui(buffer) buffer = "" last_update = time.time()
if chunk.event == "done" and buffer: update_ui(buffer) # Flush remaining bufferTrack Progress
Section titled “Track Progress”Monitor token generation rate:
let tokens = 0;const startTime = Date.now();
for await (const chunk of stream) { if (chunk.event === 'content.delta') { tokens++; const elapsed = (Date.now() - startTime) / 1000; const tokensPerSecond = tokens / elapsed;
console.log(`Rate: ${tokensPerSecond.toFixed(1)} tokens/sec`); }}import time
tokens = 0start_time = time.time()
for chunk in stream: if chunk.event == "content.delta": tokens += 1 elapsed = time.time() - start_time tokens_per_second = tokens / elapsed
print(f"Rate: {tokens_per_second:.1f} tokens/sec")Accumulate Full Response
Section titled “Accumulate Full Response”Build the complete response while streaming:
interface StreamResult { text: string; provider: string; model: string; usage: { input_tokens: number; output_tokens: number; total_tokens: number; };}
async function streamWithMetadata( input: string): Promise<StreamResult> { const stream = await client.responses.create({ model: 'auto:balance', input, stream: true });
let text = ''; let provider = ''; let model = ''; let usage = { input_tokens: 0, output_tokens: 0, total_tokens: 0 };
for await (const chunk of stream) { if (chunk.event === 'metadata') { provider = chunk.metadata.provider; model = chunk.metadata.model; }
if (chunk.event === 'content.delta') { text += chunk.delta.text; }
if (chunk.event === 'usage') { usage = chunk.usage; } }
return { text, provider, model, usage };}
const result = await streamWithMetadata('Tell me a joke');console.log(result);from typing import TypedDict
class StreamResult(TypedDict): text: str provider: str model: str usage: dict
def stream_with_metadata(input_text: str) -> StreamResult: stream = client.responses.create( model="auto:balance", input=input_text, stream=True )
text = "" provider = "" model = "" usage = {"input_tokens": 0, "output_tokens": 0, "total_tokens": 0}
for chunk in stream: if chunk.event == "metadata": provider = chunk.metadata["provider"] model = chunk.metadata["model"]
if chunk.event == "content.delta": text += chunk.delta.text
if chunk.event == "usage": usage = chunk.usage
return { "text": text, "provider": provider, "model": model, "usage": usage }
result = stream_with_metadata("Tell me a joke")print(result)Provider Compatibility
Section titled “Provider Compatibility”All TokenRouter providers support streaming:
| Provider | Streaming Support | Notes |
|---|---|---|
| OpenAI | ✅ Full | Native SSE support |
| Anthropic | ✅ Full | Native SSE support |
| Google Gemini | ✅ Full | Converted to SSE format |
| Mistral | ✅ Full | Native SSE support |
| DeepSeek | ✅ Full | OpenAI-compatible SSE |
HTTP Headers
Section titled “HTTP Headers”Streaming responses use these headers:
Content-Type: text/event-streamCache-Control: no-cacheX-Accel-Buffering: noConnection: keep-aliveBest Practices
Section titled “Best Practices”- Always handle the
doneevent - It signals stream completion - Buffer for UI updates - Don’t update on every delta
- Handle errors gracefully - Check for error events
- Set appropriate timeouts - Streaming can be long-running
- Close connections properly - Clean up when cancelled
- Track metadata - Store provider/model info for analytics
Streaming with Other Parameters
Section titled “Streaming with Other Parameters”With System Instructions
Section titled “With System Instructions”const stream = await client.responses.create({ model: 'auto:balance', input: 'Write a poem', instructions: 'You are a professional poet. Write in haiku format.', stream: true});stream = client.responses.create( model="auto:balance", input="Write a poem", instructions="You are a professional poet. Write in haiku format.", stream=True)With Temperature Control
Section titled “With Temperature Control”const stream = await client.responses.create({ model: 'auto:balance', input: 'Generate creative ideas', temperature: 1.5, // More creative stream: true});stream = client.responses.create( model="auto:balance", input="Generate creative ideas", temperature=1.5, # More creative stream=True)With Max Tokens
Section titled “With Max Tokens”const stream = await client.responses.create({ model: 'auto:balance', input: 'Summarize quantum physics', max_output_tokens: 500, stream: true});stream = client.responses.create( model="auto:balance", input="Summarize quantum physics", max_output_tokens=500, stream=True)Troubleshooting
Section titled “Troubleshooting”Stream Stops Unexpectedly
Section titled “Stream Stops Unexpectedly”- Check for error events in the stream
- Verify network connectivity
- Check firewall rules (may block content)
- Ensure proper timeout settings
No Content Received
Section titled “No Content Received”- Verify
stream: trueis set - Check authentication (API key valid)
- Confirm model is available
- Review request parameters
Slow Streaming
Section titled “Slow Streaming”- Check network latency
- Consider using
auto:latencymode - Verify provider status
- Test with different providers
Performance Considerations
Section titled “Performance Considerations”- Latency: First token typically arrives in 200-500ms
- Throughput: ~10-50 tokens/second depending on provider
- Buffering: Client buffering can add 50-100ms
- Network: Poor connectivity affects stream consistency
Next Steps
Section titled “Next Steps”- Tool Calls - Use streaming with function calling
- Response Format - Structure streamed output
- Errors - Handle streaming errors
- Standard Requests - Non-streaming alternative