Streaming
The OpenResponses API supports streaming to receive tokens as they're generated instead of waiting for the complete response. Set stream: true in your request, then read the response body as a stream of server-sent events. Each event contains a chunk of the response that you can display incrementally.
const apiKey = process.env.AI_GATEWAY_API_KEY;
const response = await fetch('https://ai-gateway.vercel.sh/v1/responses', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify({
model: 'google/gemini-3-flash',
input: [
{
type: 'message',
role: 'user',
content: 'Write a haiku about debugging code.',
},
],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data:')) {
const data = line.substring(6).trim();
if (data) {
const event = JSON.parse(data);
if (event.type === 'response.output_text.delta') {
process.stdout.write(event.delta);
}
}
}
}
}response.created- Response initializedresponse.output_text.delta- Text chunk receivedresponse.output_text.done- Text generation completeresponse.completed- Full response complete with usage stats
Was this helpful?