Advanced Configuration
Configure reasoning behavior for models that support extended thinking or chain-of-thought reasoning. The reasoning parameter allows you to control how reasoning tokens are generated and returned.
import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
// @ts-expect-error - reasoning parameter not yet in OpenAI types
const completion = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'user',
content: 'What is the meaning of life? Think before answering.',
},
],
stream: false,
reasoning: {
max_tokens: 2000, // Limit reasoning tokens
enabled: true, // Enable reasoning
},
});
console.log('Reasoning:', completion.choices[0].message.reasoning);
console.log('Answer:', completion.choices[0].message.content);
console.log(
'Reasoning tokens:',
completion.usage.completion_tokens_details?.reasoning_tokens,
);import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
completion = client.chat.completions.create(
model='anthropic/claude-sonnet-4.5',
messages=[
{
'role': 'user',
'content': 'What is the meaning of life? Think before answering.'
}
],
stream=False,
extra_body={
'reasoning': {
'max_tokens': 2000,
'enabled': True
}
}
)
print('Reasoning:', completion.choices[0].message.reasoning)
print('Answer:', completion.choices[0].message.content)
print('Reasoning tokens:', completion.usage.completion_tokens_details.reasoning_tokens)The reasoning object supports the following parameters:
-
enabled(boolean, optional): Enable reasoning output. Whentrue, the model will provide its reasoning process. -
max_tokens(number, optional): Maximum number of tokens to allocate for reasoning. This helps control costs and response times. Cannot be used witheffort. -
effort(string, optional): Control reasoning effort level. Accepts:'none'- Disables reasoning'minimal'- ~10% of max_tokens'low'- ~20% of max_tokens'medium'- ~50% of max_tokens'high'- ~80% of max_tokens'xhigh'- ~95% of max_tokens
Cannot be used with
max_tokens. -
exclude(boolean, optional): Whentrue, excludes reasoning content from the response but still generates it internally. Useful for reducing response payload size.
Mutually exclusive parameters: You cannot specify both effort and
max_tokens in the same request. Choose one based on your use case.
When reasoning is enabled, the response includes reasoning content:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "anthropic/claude-sonnet-4.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The meaning of life is a deeply personal question...",
"reasoning": "Let me think about this carefully. The question asks about..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 150,
"total_tokens": 165,
"completion_tokens_details": {
"reasoning_tokens": 50
}
}
}Reasoning content is streamed incrementally in the delta.reasoning field:
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.AI_GATEWAY_API_KEY,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
// @ts-expect-error - reasoning parameter not yet in OpenAI types
const stream = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'user',
content: 'What is the meaning of life? Think before answering.',
},
],
stream: true,
reasoning: {
enabled: true,
},
});
for await (const chunk of stream) {
const delta = chunk.choices[0]?.delta;
// Handle reasoning content
if (delta?.reasoning) {
process.stdout.write(`[Reasoning] ${delta.reasoning}`);
}
// Handle regular content
if (delta?.content) {
process.stdout.write(delta.content);
}
}import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv('AI_GATEWAY_API_KEY'),
base_url='https://ai-gateway.vercel.sh/v1'
)
stream = client.chat.completions.create(
model='anthropic/claude-sonnet-4.5',
messages=[
{
'role': 'user',
'content': 'What is the meaning of life? Think before answering.'
}
],
stream=True,
extra_body={
'reasoning': {
'enabled': True
}
}
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta:
delta = chunk.choices[0].delta
# Handle reasoning content
if hasattr(delta, 'reasoning') and delta.reasoning:
print(f"[Reasoning] {delta.reasoning}", end='', flush=True)
# Handle regular content
if hasattr(delta, 'content') and delta.content:
print(delta.content, end='', flush=True)The AI Gateway preserves reasoning details from models across interactions, normalizing the different formats used by OpenAI, Anthropic, and other providers into a consistent structure. This allows you to switch between models without rewriting your conversation management logic.
This is particularly useful during tool calling workflows where the model needs to resume its thought process after receiving tool results.
Controlling reasoning details
When reasoning.enabled is true (or when reasoning.exclude is not set),
responses include a reasoning_details array alongside the standard reasoning text field.
This structured field captures cryptographic signatures, encrypted content, and other verification
data that providers include with their reasoning output.
Each detail object contains:
type: one or more of the below, depending on the provider and model'reasoning.text': Contains the actual reasoning content as plain text in thetextfield. May include asignaturefield (Anthropic models) for cryptographic verification.'reasoning.encrypted': Contains encrypted or redacted reasoning content in thedatafield. Used by OpenAI models when reasoning is protected, or by Anthropic models when thinking is redacted. Preserves the encrypted payload for verification purposes.'reasoning.summary': Contains a condensed version of the reasoning process in thesummaryfield. Used by OpenAI models to provide a readable summary alongside encrypted reasoning.
id(optional): Unique identifier for the reasoning block, used for tracking and correlationformat: Provider format identifier -'openai-responses-v1','anthropic-claude-v1', or'unknown'index(optional): Position in the reasoning sequence (for responses with multiple reasoning blocks)
Example response with reasoning details
For Anthropic models:
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "anthropic/claude-sonnet-4.5",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The meaning of life is a deeply personal question...",
"reasoning": "Let me think about this carefully. The question asks about...",
"reasoning_details": [
{
"type": "reasoning.text",
"text": "Let me think about this carefully. The question asks about...",
"signature": "anthropic-signature-xyz",
"format": "anthropic-claude-v1",
"index": 0
}
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 150,
"total_tokens": 165,
"completion_tokens_details": {
"reasoning_tokens": 50
}
}
}For OpenAI models (returns both summary and encrypted):
{
"id": "chatcmpl-456",
"object": "chat.completion",
"created": 1677652288,
"model": "openai/o3-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The answer is 42.",
"reasoning": "Let me calculate this step by step...",
"reasoning_details": [
{
"type": "reasoning.summary",
"summary": "Let me calculate this step by step...",
"format": "openai-responses-v1",
"index": 0
},
{
"type": "reasoning.encrypted",
"data": "encrypted_reasoning_content_xyz",
"format": "openai-responses-v1",
"index": 1
}
]
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 15,
"completion_tokens": 150,
"total_tokens": 165,
"completion_tokens_details": {
"reasoning_tokens": 50
}
}
}Streaming reasoning details
When streaming, reasoning details are delivered incrementally in delta.reasoning_details:
For Anthropic models:
{
"id": "chatcmpl-123",
"object": "chat.completion.chunk",
"created": 1677652288,
"model": "anthropic/claude-sonnet-4.5",
"choices": [
{
"index": 0,
"delta": {
"reasoning": "Let me think.",
"reasoning_details": [
{
"type": "reasoning.text",
"text": "Let me think.",
"signature": "anthropic-signature-xyz",
"format": "anthropic-claude-v1",
"index": 0
}
]
},
"finish_reason": null
}
]
}For OpenAI models (summary chunks during reasoning, then encrypted at end):
{
"id": "chatcmpl-456",
"object": "chat.completion.chunk",
"created": 1677652288,
"model": "openai/o3-mini",
"choices": [
{
"index": 0,
"delta": {
"reasoning": "Step 1:",
"reasoning_details": [
{
"type": "reasoning.summary",
"summary": "Step 1:",
"format": "openai-responses-v1",
"index": 0
}
]
},
"finish_reason": null
}
]
}The AI Gateway automatically maps reasoning parameters to each provider's native format:
- OpenAI: Maps
efforttoreasoningEffortand controls summary detail - Anthropic: Maps
max_tokensto thinking budget tokens - Google: Maps to
thinkingConfigwith budget and visibility settings - Groq: Maps
excludeto control reasoning format (hidden/parsed) - xAI: Maps
effortto reasoning effort levels - Other providers: Generic mapping applied for compatibility
Automatic extraction: For models that don't natively support reasoning
output, the gateway automatically extracts reasoning
from <think> tags in the response.
The AI Gateway can route your requests across multiple AI providers for better reliability and performance. You can control which providers are used and in what order through the providerOptions parameter.
import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
// @ts-expect-error
const completion = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'user',
content:
'Tell me the history of the San Francisco Mission-style burrito in two paragraphs.',
},
],
stream: false,
// Provider options for gateway routing preferences
providerOptions: {
gateway: {
order: ['vertex', 'anthropic'], // Try Vertex AI first, then Anthropic
},
},
});
console.log('Assistant:', completion.choices[0].message.content);
console.log('Tokens used:', completion.usage);import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
completion = client.chat.completions.create(
model='anthropic/claude-sonnet-4.5',
messages=[
{
'role': 'user',
'content': 'Tell me the history of the San Francisco Mission-style burrito in two paragraphs.'
}
],
stream=False,
# Provider options for gateway routing preferences
extra_body={
'providerOptions': {
'gateway': {
'order': ['vertex', 'anthropic'] # Try Vertex AI first, then Anthropic
}
}
}
)
print('Assistant:', completion.choices[0].message.content)
print('Tokens used:', completion.usage)Provider routing: In this example, the gateway will first attempt to use Vertex AI to serve the Claude model. If Vertex AI is unavailable or fails, it will fall back to Anthropic. Other providers are still available but will only be used after the specified providers.
You can specify fallback models that will be tried in order if the primary model fails. There are two ways to do this:
Option 1: Direct models field
The simplest way is to use the models field directly at the top level of your request:
import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
const completion = await openai.chat.completions.create({
model: 'openai/gpt-5.2', // Primary model
// @ts-ignore - models is a gateway extension
models: ['anthropic/claude-sonnet-4.5', 'google/gemini-3-pro'], // Fallback models
messages: [
{
role: 'user',
content: 'Write a haiku about TypeScript.',
},
],
stream: false,
});
console.log('Assistant:', completion.choices[0].message.content);
// Check which model was actually used
console.log('Model used:', completion.model);import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
completion = client.chat.completions.create(
model='openai/gpt-5.2', # Primary model
messages=[
{
'role': 'user',
'content': 'Write a haiku about TypeScript.'
}
],
stream=False,
# models is a gateway extension for fallback models
extra_body={
'models': ['anthropic/claude-sonnet-4.5', 'google/gemini-3-pro'] # Fallback models
}
)
print('Assistant:', completion.choices[0].message.content)
# Check which model was actually used
print('Model used:', completion.model)Option 2: Via provider options
Alternatively, you can specify model fallbacks through the providerOptions.gateway.models field:
import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
// @ts-expect-error
const completion = await openai.chat.completions.create({
model: 'openai/gpt-5.2', // Primary model
messages: [
{
role: 'user',
content: 'Write a haiku about TypeScript.',
},
],
stream: false,
// Model fallbacks via provider options
providerOptions: {
gateway: {
models: ['anthropic/claude-sonnet-4.5', 'google/gemini-3-pro'], // Fallback models
},
},
});
console.log('Assistant:', completion.choices[0].message.content);
console.log('Model used:', completion.model);import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
completion = client.chat.completions.create(
model='openai/gpt-5.2', # Primary model
messages=[
{
'role': 'user',
'content': 'Write a haiku about TypeScript.'
}
],
stream=False,
# Model fallbacks via provider options
extra_body={
'providerOptions': {
'gateway': {
'models': ['anthropic/claude-sonnet-4.5', 'google/gemini-3-pro'] # Fallback models
}
}
}
)
print('Assistant:', completion.choices[0].message.content)
print('Model used:', completion.model)Which approach to use: Both methods achieve the same result. Use the
direct models field (Option 1) for simplicity, or use providerOptions
(Option 2) if you're already using provider options for other configurations.
Both configurations will:
- Try the primary model (
openai/gpt-4o) first - If it fails, try
openai/gpt-5-nano - If that also fails, try
gemini-2.0-flash - Return the result from the first model that succeeds
Provider options work with streaming requests as well:
import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
// @ts-expect-error
const stream = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'user',
content:
'Tell me the history of the San Francisco Mission-style burrito in two paragraphs.',
},
],
stream: true,
providerOptions: {
gateway: {
order: ['vertex', 'anthropic'],
},
},
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
stream = client.chat.completions.create(
model='anthropic/claude-sonnet-4.5',
messages=[
{
'role': 'user',
'content': 'Tell me the history of the San Francisco Mission-style burrito in two paragraphs.'
}
],
stream=True,
extra_body={
'providerOptions': {
'gateway': {
'order': ['vertex', 'anthropic']
}
}
}
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end='', flush=True)For more details about available providers and advanced provider configuration, see the Provider Options documentation.
You can pass your own provider credentials on a per-request basis using the byok option in providerOptions.gateway. This allows you to use your existing provider accounts and access private resources without configuring credentials in the gateway settings.
import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
// @ts-expect-error - byok is a gateway extension
const completion = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'user',
content: 'Hello, world!',
},
],
providerOptions: {
gateway: {
byok: {
anthropic: [{ apiKey: process.env.ANTHROPIC_API_KEY }],
},
},
},
});
console.log(completion.choices[0].message.content);import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
completion = client.chat.completions.create(
model='anthropic/claude-sonnet-4.5',
messages=[
{
'role': 'user',
'content': 'Hello, world!'
}
],
extra_body={
'providerOptions': {
'gateway': {
'byok': {
'anthropic': [{'apiKey': os.getenv('ANTHROPIC_API_KEY')}]
}
}
}
}
)
print(completion.choices[0].message.content)The byok option is a record where keys are provider slugs and values are arrays of credential objects. Each provider can have multiple credentials that are tried in order.
Credential structure by provider:
- Anthropic:
{ apiKey: string } - OpenAI:
{ apiKey: string } - Google Vertex AI:
{ project: string, location: string, googleCredentials: { privateKey: string, clientEmail: string } } - Amazon Bedrock:
{ accessKeyId: string, secretAccessKey: string, region?: string }
For detailed credential parameters for each provider, see the AI SDK providers documentation.
Multiple credentials example:
providerOptions: {
gateway: {
byok: {
// Multiple credentials for the same provider (tried in order)
vertex: [
{ project: 'proj-1', location: 'us-east5', googleCredentials: { privateKey: '...', clientEmail: '...' } },
{ project: 'proj-2', location: 'us-east5', googleCredentials: { privateKey: '...', clientEmail: '...' } },
],
// Multiple providers
anthropic: [{ apiKey: 'sk-ant-...' }],
},
},
},Credential precedence: When request-scoped BYOK credentials are provided, any cached BYOK credentials configured in the gateway settings are not considered. Requests may still fall back to system credentials if the provided credentials fail. For persistent BYOK configuration, see the BYOK documentation.
Anthropic Claude models support prompt caching, which can significantly reduce costs and latency for repeated prompts. When you mark content with cache_control, the model caches that content and reuses it for subsequent requests with the same prefix.
import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
const response = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'user',
content: 'Analyze this document and summarize the key points.',
cache_control: {
type: 'ephemeral',
},
},
],
});
console.log(response.choices[0].message.content);import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
response = client.chat.completions.create(
model='anthropic/claude-sonnet-4.5',
messages=[
{
'role': 'user',
'content': 'Analyze this document and summarize the key points.',
'cache_control': {
'type': 'ephemeral'
}
}
]
)
print(response.choices[0].message.content)Cache control types: The ephemeral cache type stores content for the
duration of the session. This is useful for large system prompts, documents,
or context that you want to reuse across multiple requests. Prompt caching
works with Anthropic models across all supported providers (Anthropic, Vertex
AI, and Bedrock). For more details, see Anthropic's prompt caching
documentation.
Anthropic Claude models support an extended context window of up to 1 million tokens for processing very large documents or conversations. To enable this feature, pass the anthropic-beta header with your request.
import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
const response = await openai.chat.completions.create(
{
model: 'anthropic/claude-sonnet-4.5',
messages: [
{
role: 'user',
content: 'Your very long prompt here...',
},
],
},
{
headers: {
'anthropic-beta': 'context-1m-2025-08-07',
},
},
);
console.log(response.choices[0].message.content);import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
response = client.chat.completions.create(
model='anthropic/claude-sonnet-4.5',
messages=[
{
'role': 'user',
'content': 'Your very long prompt here...'
}
],
extra_headers={
'anthropic-beta': 'context-1m-2025-08-07'
}
)
print(response.choices[0].message.content)When to use extended context: The 1M context window is useful when working with very large documents, extensive codebases, or long conversation histories that exceed the standard 200K token limit. Note that longer contexts may increase latency and costs. For more details, see Anthropic's context window documentation.
Was this helpful?