Grok 4 Fast Reasoning

xai/grok-4-fast-reasoning

Grok 4 Fast Reasoning is the speed-optimized reasoning variant of xAI's Grok 4 Fast. It combines chain-of-thought reasoning with faster inference than the full Grok 4, within a context window of 2M tokens.

ReasoningTool UseImplicit Cachingtiered-costVision (Image)File Input

import { streamText } from 'ai'

const result = streamText({
  model: 'xai/grok-4-fast-reasoning',
  prompt: 'Why is the sky blue?'
})

Playground

Try out Grok 4 Fast Reasoning by xAI. Usage is billed to your team at API rates. Free users get $5 of credits every 30 days, and you are considered a free user if you haven't made a payment.

About Grok 4 Fast Reasoning

Grok 4 Fast Reasoning is the reasoning-enabled configuration of xAI's Grok 4 Fast model, released July 9, 2025. It generates chain-of-thought reasoning traces before producing final answers, which helps on analytical and multi-step tasks compared to the non-reasoning variant, while staying faster than the full Grok 4.

The model operates with a context window of 2M tokens and excels at tasks where structured thinking improves accuracy: mathematical problem solving, code debugging, logical analysis, and complex question answering. The reasoning traces are visible in the API response, providing transparency into the model's analytical process.

Grok 4 Fast Reasoning is available at $0.2 per million input tokens and $0.5 per million output tokens through Vercel AI Gateway. It occupies the middle ground between the full Grok 4 (deepest reasoning, highest latency) and Grok 4 Fast Non-Reasoning (fastest, no reasoning overhead).

Providers

The AI Gateway supports routing requests across multiple AI providers. You can control provider preferences using the provider slugs available for copying with the buttons below. For more see the AI Gateway provider options documentation. By using the AI provider you acknowledge you reviewed and agree to their terms listed in the Legal section under the AI provider's name.

Provider

Context	Max Output	Latency	Throughput	Input	Output	Cache	Image Gen	Video Gen	Web Search	Per Query	Capabilities	ZDR	No Training	HIPAA	Release Date

Legal:Terms

•

Privacy

256K

1.8s

279tps

$0.20/M

$0.50/M

Read:

$0.05/M

Write:

—

07/09/2025

Metrics

Based exclusively on usage through AI Gateway.

Throughput24 hours

More models by xAI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

1.8s

61tps

$2.00/M

$6.00/M

Read:

$0.2/M

Write:

—

03/09/2026

2.7s

68tps

$2.00/M

$6.00/M

Read:

$0.2/M

Write:

—

03/09/2026

0.4s

130tps

$0.20/M

$0.50/M

Read:

$0.05/M

Write:

—

09/19/2025

256K

0.2s

132tps

$0.20/M

$1.50/M

Read:$0.02/M

Write:—

—

08/28/2025

0.4s

99tps

$0.20/M

$0.50/M

Read:

$0.05/M

Write:

—

07/09/2025

6.2s

184tps

$0.20/M

$0.50/M

Read:

$0.05/M

Write:

—

07/09/2025

What To Consider When Choosing a Provider

Zero Data Retention
AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.
Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

Chain-of-thought traces increase output token consumption. The reasoning process adds tokens that contribute to the response cost, so factor this into budget planning.

Grok 4 Fast Reasoning is faster than the full Grok 4 but slower than the non-reasoning variant. Test with representative prompts to confirm the latency meets your application requirements.

When to Use Grok 4 Fast Reasoning

Best For

Analytical tasks requiring structured reasoning:
Chain-of-thought improves answer quality without needing the full Grok 4's depth
Code review and debugging:
Step-by-step reasoning through logic helps catch issues systematically
Mathematical and scientific problem solving
At a level below competition-grade difficulty
Data analysis and interpretation:
The model needs to reason through trends, anomalies, and relationships
Agentic workflows with complex planning:
The agent benefits from reasoning through multi-step plans before acting

Consider Alternatives When

Hardest reasoning tasks:
The full Grok 4 returns measurably better accuracy on difficult problems
Simple, direct-response tasks:
The non-reasoning variant avoids unnecessary token overhead
Maximum throughput requirements:
Reasoning traces add unacceptable latency to each request
Budget-constrained workloads:
Grok 3 Fast provides adequate reasoning at lower cost

Conclusion

Grok 4 Fast Reasoning balances reasoning depth with inference speed, making it practical for production applications that benefit from chain-of-thought without tolerating the full Grok 4's latency profile. It's well-suited as a default reasoning model for teams that need analytical capabilities in interactive or semi-real-time contexts.

FAQ

Grok 4 Fast Reasoning generates chain-of-thought reasoning traces that improve accuracy on analytical tasks. The non-reasoning variant produces direct answers at lower latency and cost.

The full Grok 4 provides deeper reasoning at higher latency and cost. Grok 4 Fast Reasoning offers a faster alternative that still benefits from structured thinking on moderately complex tasks.

Yes. The chain-of-thought traces appear in the response. You can inspect the model's reasoning steps and verify its analytical process.

2M tokens.

Use your Vercel AI Gateway API key with xai/grok-4-fast-reasoning as the model identifier. AI Gateway manages provider routing automatically.

Check the pricing panel on this page for today's numbers. AI Gateway tracks rates across every provider that serves Grok 4 Fast Reasoning.

Zero Data Retention is not currently available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Grok 4 Fast Reasoning

Playground

About Grok 4 Fast Reasoning

Providers

More models by xAI

What To Consider When Choosing a Provider

Zero Data Retention

Authentication

When to Use Grok 4 Fast Reasoning

Best For

Analytical tasks requiring structured reasoning:

Code review and debugging:

Mathematical and scientific problem solving

Data analysis and interpretation:

Agentic workflows with complex planning:

Consider Alternatives When

Hardest reasoning tasks:

Simple, direct-response tasks:

Maximum throughput requirements:

Budget-constrained workloads:

Conclusion

FAQ

Playground

About Grok 4 Fast Reasoning

Providers

More models by xAI

About Grok 4 Fast Reasoning

What To Consider When Choosing a Provider

Zero Data Retention

Authentication

When to Use Grok 4 Fast Reasoning

Best For

Analytical tasks requiring structured reasoning:

Code review and debugging:

Mathematical and scientific problem solving

Data analysis and interpretation:

Agentic workflows with complex planning:

Consider Alternatives When

Hardest reasoning tasks:

Simple, direct-response tasks:

Maximum throughput requirements:

Budget-constrained workloads:

Conclusion

FAQ

What is the difference between Grok 4 Fast Reasoning and Grok 4 Fast Non-Reasoning?

How does Grok 4 Fast Reasoning compare to the full Grok 4?

Can I see the reasoning traces in the API response?

What is the context window?

How do I authenticate with Grok 4 Fast Reasoning through Vercel AI Gateway?

What does Grok 4 Fast Reasoning cost?

Does Vercel AI Gateway support Zero Data Retention for Grok 4 Fast Reasoning?

About Grok 4 Fast Reasoning