Menu

Advanced Configuration

Last updated May 21, 2025

Configure reasoning behavior for models that support extended thinking or chain-of-thought reasoning. The reasoning parameter allows you to control how reasoning tokens are generated and returned.

Example request
reasoning-openai-sdk.ts
import OpenAI from 'openai';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const openai = new OpenAI({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
// @ts-expect-error - reasoning parameter not yet in OpenAI types
const completion = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content: 'What is the meaning of life? Think before answering.',
    },
  ],
  stream: false,
  reasoning: {
    max_tokens: 2000, // Limit reasoning tokens
    enabled: true, // Enable reasoning
  },
});
 
console.log('Reasoning:', completion.choices[0].message.reasoning);
console.log('Answer:', completion.choices[0].message.content);
console.log(
  'Reasoning tokens:',
  completion.usage.completion_tokens_details?.reasoning_tokens,
);
reasoning.py
import os
from openai import OpenAI
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = OpenAI(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
completion = client.chat.completions.create(
    model='anthropic/claude-sonnet-4.5',
    messages=[
        {
            'role': 'user',
            'content': 'What is the meaning of life? Think before answering.'
        }
    ],
    stream=False,
    extra_body={
        'reasoning': {
            'max_tokens': 2000,
            'enabled': True
        }
    }
)
 
print('Reasoning:', completion.choices[0].message.reasoning)
print('Answer:', completion.choices[0].message.content)
print('Reasoning tokens:', completion.usage.completion_tokens_details.reasoning_tokens)

The reasoning object supports the following parameters:

  • enabled (boolean, optional): Enable reasoning output. When true, the model will provide its reasoning process.

  • max_tokens (number, optional): Maximum number of tokens to allocate for reasoning. This helps control costs and response times. Cannot be used with effort.

  • effort (string, optional): Control reasoning effort level. Accepts:

    • 'none' - Disables reasoning
    • 'minimal' - ~10% of max_tokens
    • 'low' - ~20% of max_tokens
    • 'medium' - ~50% of max_tokens
    • 'high' - ~80% of max_tokens
    • 'xhigh' - ~95% of max_tokens

    Cannot be used with max_tokens.

  • exclude (boolean, optional): When true, excludes reasoning content from the response but still generates it internally. Useful for reducing response payload size.

Mutually exclusive parameters: You cannot specify both effort and max_tokens in the same request. Choose one based on your use case.

When reasoning is enabled, the response includes reasoning content:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "anthropic/claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The meaning of life is a deeply personal question...",
        "reasoning": "Let me think about this carefully. The question asks about..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 150,
    "total_tokens": 165,
    "completion_tokens_details": {
      "reasoning_tokens": 50
    }
  }
}

Reasoning content is streamed incrementally in the delta.reasoning field:

reasoning-streaming.ts
import OpenAI from 'openai';
 
const openai = new OpenAI({
  apiKey: process.env.AI_GATEWAY_API_KEY,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
// @ts-expect-error - reasoning parameter not yet in OpenAI types
const stream = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content: 'What is the meaning of life? Think before answering.',
    },
  ],
  stream: true,
  reasoning: {
    enabled: true,
  },
});
 
for await (const chunk of stream) {
  const delta = chunk.choices[0]?.delta;
 
  // Handle reasoning content
  if (delta?.reasoning) {
    process.stdout.write(`[Reasoning] ${delta.reasoning}`);
  }
 
  // Handle regular content
  if (delta?.content) {
    process.stdout.write(delta.content);
  }
}
reasoning-streaming.py
import os
from openai import OpenAI
 
client = OpenAI(
    api_key=os.getenv('AI_GATEWAY_API_KEY'),
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
stream = client.chat.completions.create(
    model='anthropic/claude-sonnet-4.5',
    messages=[
        {
            'role': 'user',
            'content': 'What is the meaning of life? Think before answering.'
        }
    ],
    stream=True,
    extra_body={
        'reasoning': {
            'enabled': True
        }
    }
)
 
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta:
        delta = chunk.choices[0].delta
 
        # Handle reasoning content
        if hasattr(delta, 'reasoning') and delta.reasoning:
            print(f"[Reasoning] {delta.reasoning}", end='', flush=True)
 
        # Handle regular content
        if hasattr(delta, 'content') and delta.content:
            print(delta.content, end='', flush=True)

The AI Gateway preserves reasoning details from models across interactions, normalizing the different formats used by OpenAI, Anthropic, and other providers into a consistent structure. This allows you to switch between models without rewriting your conversation management logic.

This is particularly useful during tool calling workflows where the model needs to resume its thought process after receiving tool results.

Controlling reasoning details

When reasoning.enabled is true (or when reasoning.exclude is not set), responses include a reasoning_details array alongside the standard reasoning text field. This structured field captures cryptographic signatures, encrypted content, and other verification data that providers include with their reasoning output.

Each detail object contains:

  • type: one or more of the below, depending on the provider and model
    • 'reasoning.text': Contains the actual reasoning content as plain text in the text field. May include a signature field (Anthropic models) for cryptographic verification.
    • 'reasoning.encrypted': Contains encrypted or redacted reasoning content in the data field. Used by OpenAI models when reasoning is protected, or by Anthropic models when thinking is redacted. Preserves the encrypted payload for verification purposes.
    • 'reasoning.summary': Contains a condensed version of the reasoning process in the summary field. Used by OpenAI models to provide a readable summary alongside encrypted reasoning.
  • id (optional): Unique identifier for the reasoning block, used for tracking and correlation
  • format: Provider format identifier - 'openai-responses-v1', 'anthropic-claude-v1', or 'unknown'
  • index (optional): Position in the reasoning sequence (for responses with multiple reasoning blocks)

Example response with reasoning details

For Anthropic models:

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "anthropic/claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The meaning of life is a deeply personal question...",
        "reasoning": "Let me think about this carefully. The question asks about...",
        "reasoning_details": [
          {
            "type": "reasoning.text",
            "text": "Let me think about this carefully. The question asks about...",
            "signature": "anthropic-signature-xyz",
            "format": "anthropic-claude-v1",
            "index": 0
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 150,
    "total_tokens": 165,
    "completion_tokens_details": {
      "reasoning_tokens": 50
    }
  }
}

For OpenAI models (returns both summary and encrypted):

{
  "id": "chatcmpl-456",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai/o3-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The answer is 42.",
        "reasoning": "Let me calculate this step by step...",
        "reasoning_details": [
          {
            "type": "reasoning.summary",
            "summary": "Let me calculate this step by step...",
            "format": "openai-responses-v1",
            "index": 0
          },
          {
            "type": "reasoning.encrypted",
            "data": "encrypted_reasoning_content_xyz",
            "format": "openai-responses-v1",
            "index": 1
          }
        ]
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 150,
    "total_tokens": 165,
    "completion_tokens_details": {
      "reasoning_tokens": 50
    }
  }
}

Streaming reasoning details

When streaming, reasoning details are delivered incrementally in delta.reasoning_details:

For Anthropic models:

{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "anthropic/claude-sonnet-4.5",
  "choices": [
    {
      "index": 0,
      "delta": {
        "reasoning": "Let me think.",
        "reasoning_details": [
          {
            "type": "reasoning.text",
            "text": "Let me think.",
            "signature": "anthropic-signature-xyz",
            "format": "anthropic-claude-v1",
            "index": 0
          }
        ]
      },
      "finish_reason": null
    }
  ]
}

For OpenAI models (summary chunks during reasoning, then encrypted at end):

{
  "id": "chatcmpl-456",
  "object": "chat.completion.chunk",
  "created": 1677652288,
  "model": "openai/o3-mini",
  "choices": [
    {
      "index": 0,
      "delta": {
        "reasoning": "Step 1:",
        "reasoning_details": [
          {
            "type": "reasoning.summary",
            "summary": "Step 1:",
            "format": "openai-responses-v1",
            "index": 0
          }
        ]
      },
      "finish_reason": null
    }
  ]
}

The AI Gateway automatically maps reasoning parameters to each provider's native format:

  • OpenAI: Maps effort to reasoningEffort and controls summary detail
  • Anthropic: Maps max_tokens to thinking budget tokens
  • Google: Maps to thinkingConfig with budget and visibility settings
  • Groq: Maps exclude to control reasoning format (hidden/parsed)
  • xAI: Maps effort to reasoning effort levels
  • Other providers: Generic mapping applied for compatibility

Automatic extraction: For models that don't natively support reasoning output, the gateway automatically extracts reasoning from <think> tags in the response.

The AI Gateway can route your requests across multiple AI providers for better reliability and performance. You can control which providers are used and in what order through the providerOptions parameter.

Example request
provider-options.ts
import OpenAI from 'openai';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const openai = new OpenAI({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
// @ts-expect-error
const completion = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content:
        'Tell me the history of the San Francisco Mission-style burrito in two paragraphs.',
    },
  ],
  stream: false,
  // Provider options for gateway routing preferences
  providerOptions: {
    gateway: {
      order: ['vertex', 'anthropic'], // Try Vertex AI first, then Anthropic
    },
  },
});
 
console.log('Assistant:', completion.choices[0].message.content);
console.log('Tokens used:', completion.usage);
provider-options.py
import os
from openai import OpenAI
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = OpenAI(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
completion = client.chat.completions.create(
    model='anthropic/claude-sonnet-4.5',
    messages=[
        {
            'role': 'user',
            'content': 'Tell me the history of the San Francisco Mission-style burrito in two paragraphs.'
        }
    ],
    stream=False,
    # Provider options for gateway routing preferences
    extra_body={
        'providerOptions': {
            'gateway': {
                'order': ['vertex', 'anthropic']  # Try Vertex AI first, then Anthropic
            }
        }
    }
)
 
print('Assistant:', completion.choices[0].message.content)
print('Tokens used:', completion.usage)

Provider routing: In this example, the gateway will first attempt to use Vertex AI to serve the Claude model. If Vertex AI is unavailable or fails, it will fall back to Anthropic. Other providers are still available but will only be used after the specified providers.

You can specify fallback models that will be tried in order if the primary model fails. There are two ways to do this:

Option 1: Direct models field

The simplest way is to use the models field directly at the top level of your request:

model-fallbacks.ts
import OpenAI from 'openai';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const openai = new OpenAI({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
const completion = await openai.chat.completions.create({
  model: 'openai/gpt-5.2', // Primary model
  // @ts-ignore - models is a gateway extension
  models: ['anthropic/claude-sonnet-4.5', 'google/gemini-3-pro'], // Fallback models
  messages: [
    {
      role: 'user',
      content: 'Write a haiku about TypeScript.',
    },
  ],
  stream: false,
});
 
console.log('Assistant:', completion.choices[0].message.content);
 
// Check which model was actually used
console.log('Model used:', completion.model);
model-fallbacks.py
import os
from openai import OpenAI
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = OpenAI(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
completion = client.chat.completions.create(
    model='openai/gpt-5.2',  # Primary model
    messages=[
        {
            'role': 'user',
            'content': 'Write a haiku about TypeScript.'
        }
    ],
    stream=False,
    # models is a gateway extension for fallback models
    extra_body={
        'models': ['anthropic/claude-sonnet-4.5', 'google/gemini-3-pro']  # Fallback models
    }
)
 
print('Assistant:', completion.choices[0].message.content)
 
# Check which model was actually used
print('Model used:', completion.model)
Option 2: Via provider options

Alternatively, you can specify model fallbacks through the providerOptions.gateway.models field:

model-fallbacks-provider-options.ts
import OpenAI from 'openai';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const openai = new OpenAI({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
// @ts-expect-error
const completion = await openai.chat.completions.create({
  model: 'openai/gpt-5.2', // Primary model
  messages: [
    {
      role: 'user',
      content: 'Write a haiku about TypeScript.',
    },
  ],
  stream: false,
  // Model fallbacks via provider options
  providerOptions: {
    gateway: {
      models: ['anthropic/claude-sonnet-4.5', 'google/gemini-3-pro'], // Fallback models
    },
  },
});
 
console.log('Assistant:', completion.choices[0].message.content);
console.log('Model used:', completion.model);
model-fallbacks-provider-options.py
import os
from openai import OpenAI
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = OpenAI(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
completion = client.chat.completions.create(
    model='openai/gpt-5.2',  # Primary model
    messages=[
        {
            'role': 'user',
            'content': 'Write a haiku about TypeScript.'
        }
    ],
    stream=False,
    # Model fallbacks via provider options
    extra_body={
        'providerOptions': {
            'gateway': {
                'models': ['anthropic/claude-sonnet-4.5', 'google/gemini-3-pro']  # Fallback models
            }
        }
    }
)
 
print('Assistant:', completion.choices[0].message.content)
print('Model used:', completion.model)

Which approach to use: Both methods achieve the same result. Use the direct models field (Option 1) for simplicity, or use providerOptions (Option 2) if you're already using provider options for other configurations.

Both configurations will:

  1. Try the primary model (openai/gpt-4o) first
  2. If it fails, try openai/gpt-5-nano
  3. If that also fails, try gemini-2.0-flash
  4. Return the result from the first model that succeeds

Provider options work with streaming requests as well:

streaming-provider-options.ts
import OpenAI from 'openai';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const openai = new OpenAI({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
// @ts-expect-error
const stream = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content:
        'Tell me the history of the San Francisco Mission-style burrito in two paragraphs.',
    },
  ],
  stream: true,
  providerOptions: {
    gateway: {
      order: ['vertex', 'anthropic'],
    },
  },
});
 
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}
streaming-provider-options.py
import os
from openai import OpenAI
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = OpenAI(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
stream = client.chat.completions.create(
    model='anthropic/claude-sonnet-4.5',
    messages=[
        {
            'role': 'user',
            'content': 'Tell me the history of the San Francisco Mission-style burrito in two paragraphs.'
        }
    ],
    stream=True,
    extra_body={
        'providerOptions': {
            'gateway': {
                'order': ['vertex', 'anthropic']
            }
        }
    }
)
 
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end='', flush=True)

For more details about available providers and advanced provider configuration, see the Provider Options documentation.

You can pass your own provider credentials on a per-request basis using the byok option in providerOptions.gateway. This allows you to use your existing provider accounts and access private resources without configuring credentials in the gateway settings.

Example request
byok.ts
import OpenAI from 'openai';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const openai = new OpenAI({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
// @ts-expect-error - byok is a gateway extension
const completion = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content: 'Hello, world!',
    },
  ],
  providerOptions: {
    gateway: {
      byok: {
        anthropic: [{ apiKey: process.env.ANTHROPIC_API_KEY }],
      },
    },
  },
});
 
console.log(completion.choices[0].message.content);
byok.py
import os
from openai import OpenAI
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = OpenAI(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
completion = client.chat.completions.create(
    model='anthropic/claude-sonnet-4.5',
    messages=[
        {
            'role': 'user',
            'content': 'Hello, world!'
        }
    ],
    extra_body={
        'providerOptions': {
            'gateway': {
                'byok': {
                    'anthropic': [{'apiKey': os.getenv('ANTHROPIC_API_KEY')}]
                }
            }
        }
    }
)
 
print(completion.choices[0].message.content)

The byok option is a record where keys are provider slugs and values are arrays of credential objects. Each provider can have multiple credentials that are tried in order.

Credential structure by provider:

  • Anthropic: { apiKey: string }
  • OpenAI: { apiKey: string }
  • Google Vertex AI: { project: string, location: string, googleCredentials: { privateKey: string, clientEmail: string } }
  • Amazon Bedrock: { accessKeyId: string, secretAccessKey: string, region?: string }

For detailed credential parameters for each provider, see the AI SDK providers documentation.

Multiple credentials example:

providerOptions: {
  gateway: {
    byok: {
      // Multiple credentials for the same provider (tried in order)
      vertex: [
        { project: 'proj-1', location: 'us-east5', googleCredentials: { privateKey: '...', clientEmail: '...' } },
        { project: 'proj-2', location: 'us-east5', googleCredentials: { privateKey: '...', clientEmail: '...' } },
      ],
      // Multiple providers
      anthropic: [{ apiKey: 'sk-ant-...' }],
    },
  },
},

Credential precedence: When request-scoped BYOK credentials are provided, any cached BYOK credentials configured in the gateway settings are not considered. Requests may still fall back to system credentials if the provided credentials fail. For persistent BYOK configuration, see the BYOK documentation.

Anthropic Claude models support prompt caching, which can significantly reduce costs and latency for repeated prompts. When you mark content with cache_control, the model caches that content and reuses it for subsequent requests with the same prefix.

Example request
prompt-caching.ts
import OpenAI from 'openai';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const openai = new OpenAI({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
const response = await openai.chat.completions.create({
  model: 'anthropic/claude-sonnet-4.5',
  messages: [
    {
      role: 'user',
      content: 'Analyze this document and summarize the key points.',
      cache_control: {
        type: 'ephemeral',
      },
    },
  ],
});
 
console.log(response.choices[0].message.content);
prompt-caching.py
import os
from openai import OpenAI
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = OpenAI(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
response = client.chat.completions.create(
    model='anthropic/claude-sonnet-4.5',
    messages=[
        {
            'role': 'user',
            'content': 'Analyze this document and summarize the key points.',
            'cache_control': {
                'type': 'ephemeral'
            }
        }
    ]
)
 
print(response.choices[0].message.content)

Cache control types: The ephemeral cache type stores content for the duration of the session. This is useful for large system prompts, documents, or context that you want to reuse across multiple requests. Prompt caching works with Anthropic models across all supported providers (Anthropic, Vertex AI, and Bedrock). For more details, see Anthropic's prompt caching documentation.

Anthropic Claude models support an extended context window of up to 1 million tokens for processing very large documents or conversations. To enable this feature, pass the anthropic-beta header with your request.

Example request
extended-context.ts
import OpenAI from 'openai';
 
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
 
const openai = new OpenAI({
  apiKey,
  baseURL: 'https://ai-gateway.vercel.sh/v1',
});
 
const response = await openai.chat.completions.create(
  {
    model: 'anthropic/claude-sonnet-4.5',
    messages: [
      {
        role: 'user',
        content: 'Your very long prompt here...',
      },
    ],
  },
  {
    headers: {
      'anthropic-beta': 'context-1m-2025-08-07',
    },
  },
);
 
console.log(response.choices[0].message.content);
extended-context.py
import os
from openai import OpenAI
 
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
 
client = OpenAI(
    api_key=api_key,
    base_url='https://ai-gateway.vercel.sh/v1'
)
 
response = client.chat.completions.create(
    model='anthropic/claude-sonnet-4.5',
    messages=[
        {
            'role': 'user',
            'content': 'Your very long prompt here...'
        }
    ],
    extra_headers={
        'anthropic-beta': 'context-1m-2025-08-07'
    }
)
 
print(response.choices[0].message.content)

When to use extended context: The 1M context window is useful when working with very large documents, extensive codebases, or long conversation histories that exceed the standard 200K token limit. Note that longer contexts may increase latency and costs. For more details, see Anthropic's context window documentation.


Was this helpful?

supported.