What does "hybrid reasoning" mean for Gemini 2.5 Flash?

It means the model operates in two modes: with thinking disabled (behaving like a fast response model comparable to 2.0 Flash) or with thinking enabled at a configurable budget, where it reasons through the problem before generating an answer.

How do thinking budgets work?

You set a per-request parameter that controls how much deliberation the model applies before responding. A higher budget allows more reasoning steps, improving accuracy on complex tasks at the cost of more tokens and higher latency. A lower budget favors speed and cost.

If I disable thinking, how does 2.5 Flash compare to 2.0 Flash?

2.5 Flash outperforms 2.0 Flash even with thinking disabled. The 2.5 base model is stronger regardless of thinking mode.

Does Gemini 2.5 Flash support Google Search tool use?

Yes. Google Search and code execution are shared capabilities across all Gemini 2.5 models, including Gemini 2.5 Flash.

How is 2.5 Flash positioned relative to 2.5 Pro?

Gemini 2.5 Flash sits at the Pareto frontier of cost and performance. It delivers strong reasoning at a lower cost than 2.5 Pro, which targets complex tasks with strong benchmark scores.

Is Gemini 2.5 Flash generally available?

It launched in preview on March 20, 2025. Google later promoted it to stable general availability alongside 2.5 Pro as part of the Gemini 2.5 family expansion.

Gemini 2.5 Flash

Gemini 2.5 Flash is Google's first fully hybrid reasoning model, letting developers toggle thinking on or off and set thinking budgets to tune the balance between quality, cost, and latency, all on top of the fast, multimodal foundation of 2.0 Flash.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'google/gemini-2.5-flash',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Gemini 2.5 Flash by Google. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Legal:Terms

•

Privacy

0.8s

171tps

$0.30/M

$2.50/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

03/20/2025

Legal:Terms

•

Privacy

0.4s

187tps

$0.30/M

$2.50/M

Read:$0.03/M

Write:—

$35.00/K

+ input costs

—

03/20/2025

More models by Google

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.0s

273tps

$1.50/M

$9.00/M

Read:$0.15/M

Write:—

$14.00/K

+ input costs

—

05/19/2026

0.7s

247tps

$0.25/M

$1.50/M

Read:$0.03/M

Write:—

$14.00/K

+ input costs

—

03/03/2026

3.2s

182tps

$2.00/M

$12.00/M

Read:

$0.2/M

Write:

—

$14.00/K

+ input costs

—

02/19/2026

0.8s

178tps

$0.50/M

$3.00/M

Read:

$0.05/M

Write:

—

$14.00/K

+ input costs

—

12/17/2025

0.3s

395tps

$0.10/M

$0.40/M

Read:$0.01/M

Write:—

$35.00/K

+ input costs

—

06/17/2025

1.8s

130tps

$1.25/M

$10.00/M

Read:

$0.13/M

Write:

—

$35.00/K

+ input costs

—

03/20/2025

About Gemini 2.5 Flash

Gemini 2.5 Flash builds directly on the 2.0 Flash foundation, carrying forward its speed and cost characteristics while adding a major reasoning upgrade. It launched in preview as Google's first fully hybrid reasoning model, a classification that sets it apart from both the 2.0 Flash generation and pure thinking models.

The hybrid design means thinking is not always on. You can disable thinking entirely to maintain 2.0 Flash response speed, or enable it and set thinking budgets to control how much deliberation the model applies before answering. With thinking on, 2.5 Flash shows meaningful performance improvements over the 2.0 generation on reasoning-intensive tasks. Its performance-to-cost ratio places it on the Pareto frontier, competitive on quality without requiring the full resource commitment of 2.5 Pro. This makes it well-suited for applications where some prompts are routine and some are complex, and you want a single model that adapts accordingly.

Gemini 2.5 Flash also integrates with tools including Google Search and code execution, and accepts multimodal input across text, images, video, and audio. The context window is 1M tokens, maintaining the long-context capability of the 2.0 Flash generation.

What To Consider When Choosing a Provider

Configuration: Thinking budgets affect token consumption and latency, so evaluate provider rate limits and pricing tiers with thinking-enabled requests before committing to a provider variant at production scale.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Gemini 2.5 Flash

Best For

Workloads with mixed complexity: Applications that serve both simple requests and hard reasoning problems benefit from the ability to set per-request thinking budgets rather than paying for full reasoning on every call
Reasoning-intensive pipelines: Multi-step math, science, coding, or logic tasks where 2.0 Flash's speed was sufficient but accuracy needs improvement
Cost-conscious agentic applications: Need chain-of-thought planning but cannot afford 2.5 Pro pricing across high request volumes
Coding and code transformation tasks: Benefit from the reasoning capabilities introduced in the 2.5 generation, including agentic code applications
Multimodal reasoning: Images, video, or audio inputs that require more nuanced analysis than pattern matching

Consider Alternatives When

Deepest reasoning required: Highly complex problems with no speed or cost constraint, where 2.5 Pro's stronger benchmark scores may justify the premium
Uniform high-volume inference: Entirely low-complexity workloads where thinking overhead adds cost without benefit, making 2.5 Flash-Lite or 2.0 Flash-Lite more appropriate
Native image or audio output: 2.5 Flash outputs text only, so media generation needs a different model

Conclusion

Gemini 2.5 Flash introduces a new dimension of control to the Flash model family. You can dial reasoning depth from zero to a configured budget, matching compute expenditure to actual task complexity. It retains the efficiency that made Flash popular while unlocking the reasoning quality that previously required a heavier model.

Frequently Asked Questions

What does "hybrid reasoning" mean for Gemini 2.5 Flash?
It means the model operates in two modes: with thinking disabled (behaving like a fast response model comparable to 2.0 Flash) or with thinking enabled at a configurable budget, where it reasons through the problem before generating an answer.
How do thinking budgets work?
You set a per-request parameter that controls how much deliberation the model applies before responding. A higher budget allows more reasoning steps, improving accuracy on complex tasks at the cost of more tokens and higher latency. A lower budget favors speed and cost.
If I disable thinking, how does 2.5 Flash compare to 2.0 Flash?
2.5 Flash outperforms 2.0 Flash even with thinking disabled. The 2.5 base model is stronger regardless of thinking mode.
Does Gemini 2.5 Flash support Google Search tool use?
Yes. Google Search and code execution are shared capabilities across all Gemini 2.5 models, including Gemini 2.5 Flash.
What is the context window for Gemini 2.5 Flash?
The context window is 1M tokens.
How is 2.5 Flash positioned relative to 2.5 Pro?
Gemini 2.5 Flash sits at the Pareto frontier of cost and performance. It delivers strong reasoning at a lower cost than 2.5 Pro, which targets complex tasks with strong benchmark scores.
Is Gemini 2.5 Flash generally available?
It launched in preview on March 20, 2025. Google later promoted it to stable general availability alongside 2.5 Pro as part of the Gemini 2.5 family expansion.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Gemini 2.5 Flash

Playground

Providers

More models by Google

About Gemini 2.5 Flash

What To Consider When Choosing a Provider

When to Use Gemini 2.5 Flash

Best For

Consider Alternatives When

Conclusion

Frequently Asked Questions