Skip to content

Streaming Responses

TokenRouter supports streaming responses using Server-Sent Events (SSE), allowing you to receive text as it’s generated rather than waiting for the complete response. This provides a better user experience for long-form content generation.

When streaming is enabled:

  1. The request is sent with stream: true
  2. Server opens an SSE connection
  3. Events are sent as they’re generated:
    • metadata - Provider and model information
    • message.start - Response begins
    • content.delta - Text chunks (multiple)
    • usage - Token consumption (optional)
    • message.end - Response complete
    • done - Stream terminator
import Tokenrouter from 'tokenrouter';
const client = new Tokenrouter({
apiKey: process.env.TOKENROUTER_API_KEY
});
const stream = await client.responses.create({
model: 'auto:balance',
input: 'Write a short story about a robot',
stream: true
});
for await (const chunk of stream) {
if (chunk.event === 'content.delta') {
process.stdout.write(chunk.delta.text);
}
}

Sent first with routing information:

event: metadata
data: {"provider":"openai","model":"gpt-4o-2024-11-20","routing_mode":"balance"}

Marks the beginning of the response:

event: message.start
data: {"type":"message","role":"assistant"}

Contains text chunks as they’re generated (sent multiple times):

event: content.delta
data: {"delta":{"text":"Once"}}
event: content.delta
data: {"delta":{"text":" upon"}}
event: content.delta
data: {"delta":{"text":" a"}}
event: content.delta
data: {"delta":{"text":" time"}}

Token consumption statistics:

event: usage
data: {"input_tokens":15,"output_tokens":128,"total_tokens":143}

Marks completion of the message:

event: message.end
data: {"type":"message_end"}

Stream terminator - always the last event:

event: done
data: null
const stream = await client.responses.create({
model: 'auto:balance',
input: 'Explain quantum computing',
stream: true
});
let fullText = '';
let metadata = null;
let usage = null;
for await (const chunk of stream) {
switch (chunk.event) {
case 'metadata':
metadata = chunk.metadata;
console.log(`Using ${metadata.provider} (${metadata.model})`);
break;
case 'message.start':
console.log('Response starting...\n');
break;
case 'content.delta':
const text = chunk.delta.text;
fullText += text;
process.stdout.write(text);
break;
case 'usage':
usage = chunk.usage;
break;
case 'message.end':
console.log('\n\nResponse complete');
break;
case 'done':
console.log(`\nTokens used: ${usage?.total_tokens || 'unknown'}`);
break;
}
}
console.log(`\nFull response length: ${fullText.length} characters`);

Errors during streaming are sent as error events:

try {
const stream = await client.responses.create({
model: 'auto:balance',
input: 'My SSN is 123-45-6789', // May be blocked by firewall
stream: true
});
for await (const chunk of stream) {
if (chunk.event === 'error') {
console.error('Stream error:', chunk.error.message);
break;
}
if (chunk.event === 'content.delta') {
process.stdout.write(chunk.delta.text);
}
}
} catch (error) {
console.error('Failed to start stream:', error.message);
}

For UI updates, buffer text for smoother rendering:

let buffer = '';
let lastUpdate = Date.now();
const UPDATE_INTERVAL = 50; // ms
for await (const chunk of stream) {
if (chunk.event === 'content.delta') {
buffer += chunk.delta.text;
// Update UI every 50ms
if (Date.now() - lastUpdate > UPDATE_INTERVAL) {
updateUI(buffer);
buffer = '';
lastUpdate = Date.now();
}
}
if (chunk.event === 'done' && buffer) {
updateUI(buffer); // Flush remaining buffer
}
}

Monitor token generation rate:

let tokens = 0;
const startTime = Date.now();
for await (const chunk of stream) {
if (chunk.event === 'content.delta') {
tokens++;
const elapsed = (Date.now() - startTime) / 1000;
const tokensPerSecond = tokens / elapsed;
console.log(`Rate: ${tokensPerSecond.toFixed(1)} tokens/sec`);
}
}

Build the complete response while streaming:

interface StreamResult {
text: string;
provider: string;
model: string;
usage: {
input_tokens: number;
output_tokens: number;
total_tokens: number;
};
}
async function streamWithMetadata(
input: string
): Promise<StreamResult> {
const stream = await client.responses.create({
model: 'auto:balance',
input,
stream: true
});
let text = '';
let provider = '';
let model = '';
let usage = { input_tokens: 0, output_tokens: 0, total_tokens: 0 };
for await (const chunk of stream) {
if (chunk.event === 'metadata') {
provider = chunk.metadata.provider;
model = chunk.metadata.model;
}
if (chunk.event === 'content.delta') {
text += chunk.delta.text;
}
if (chunk.event === 'usage') {
usage = chunk.usage;
}
}
return { text, provider, model, usage };
}
const result = await streamWithMetadata('Tell me a joke');
console.log(result);

All TokenRouter providers support streaming:

ProviderStreaming SupportNotes
OpenAI✅ FullNative SSE support
Anthropic✅ FullNative SSE support
Google Gemini✅ FullConverted to SSE format
Mistral✅ FullNative SSE support
DeepSeek✅ FullOpenAI-compatible SSE

Streaming responses use these headers:

Content-Type: text/event-stream
Cache-Control: no-cache
X-Accel-Buffering: no
Connection: keep-alive
  1. Always handle the done event - It signals stream completion
  2. Buffer for UI updates - Don’t update on every delta
  3. Handle errors gracefully - Check for error events
  4. Set appropriate timeouts - Streaming can be long-running
  5. Close connections properly - Clean up when cancelled
  6. Track metadata - Store provider/model info for analytics
const stream = await client.responses.create({
model: 'auto:balance',
input: 'Write a poem',
instructions: 'You are a professional poet. Write in haiku format.',
stream: true
});
const stream = await client.responses.create({
model: 'auto:balance',
input: 'Generate creative ideas',
temperature: 1.5, // More creative
stream: true
});
const stream = await client.responses.create({
model: 'auto:balance',
input: 'Summarize quantum physics',
max_output_tokens: 500,
stream: true
});
  • Check for error events in the stream
  • Verify network connectivity
  • Check firewall rules (may block content)
  • Ensure proper timeout settings
  • Verify stream: true is set
  • Check authentication (API key valid)
  • Confirm model is available
  • Review request parameters
  • Check network latency
  • Consider using auto:latency mode
  • Verify provider status
  • Test with different providers
  • Latency: First token typically arrives in 200-500ms
  • Throughput: ~10-50 tokens/second depending on provider
  • Buffering: Client buffering can add 50-100ms
  • Network: Poor connectivity affects stream consistency