Skip to content
Vercel April 2026 security incident

Gemini 2.5 Flash

google/gemini-2.5-flash

Gemini 2.5 Flash is Google's first fully hybrid reasoning model, letting developers toggle thinking on or off and set thinking budgets to tune the balance between quality, cost, and latency, all on top of the fast, multimodal foundation of 2.0 Flash.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-2.5-flash',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

Thinking budgets affect token consumption and latency, so evaluate provider rate limits and pricing tiers with thinking-enabled requests before committing to a provider variant at production scale.

When to Use Gemini 2.5 Flash

Best For

  • Workloads with mixed complexity:

    Applications that serve both simple requests and hard reasoning problems benefit from the ability to set per-request thinking budgets rather than paying for full reasoning on every call

  • Reasoning-intensive pipelines:

    Multi-step math, science, coding, or logic tasks where 2.0 Flash's speed was sufficient but accuracy needs improvement

  • Cost-conscious agentic applications:

    Need chain-of-thought planning but cannot afford 2.5 Pro pricing across high request volumes

  • Coding and code transformation tasks:

    Benefit from the reasoning capabilities introduced in the 2.5 generation, including agentic code applications

  • Multimodal reasoning:

    Images, video, or audio inputs that require more nuanced analysis than pattern matching

Consider Alternatives When

  • Deepest reasoning required:

    Highly complex problems with no speed or cost constraint, where 2.5 Pro's stronger benchmark scores may justify the premium

  • Uniform high-volume inference:

    Entirely low-complexity workloads where thinking overhead adds cost without benefit, making 2.5 Flash-Lite or 2.0 Flash-Lite more appropriate

  • Native image or audio output:

    2.5 Flash outputs text only, so media generation needs a different model

Conclusion

Gemini 2.5 Flash introduces a new dimension of control to the Flash model family. You can dial reasoning depth from zero to a configured budget, matching compute expenditure to actual task complexity. It retains the efficiency that made Flash popular while unlocking the reasoning quality that previously required a heavier model.

FAQ

It means the model operates in two modes: with thinking disabled (behaving like a fast response model comparable to 2.0 Flash) or with thinking enabled at a configurable budget, where it reasons through the problem before generating an answer.

You set a per-request parameter that controls how much deliberation the model applies before responding. A higher budget allows more reasoning steps, improving accuracy on complex tasks at the cost of more tokens and higher latency. A lower budget favors speed and cost.

2.5 Flash outperforms 2.0 Flash even with thinking disabled. The 2.5 base model is stronger regardless of thinking mode.

Yes. Google Search and code execution are shared capabilities across all Gemini 2.5 models, including Gemini 2.5 Flash.

The context window is 1M tokens.

Gemini 2.5 Flash sits at the Pareto frontier of cost and performance. It delivers strong reasoning at a lower cost than 2.5 Pro, which targets complex tasks with strong benchmark scores.

It launched in preview on March 20, 2025. Google later promoted it to stable general availability alongside 2.5 Pro as part of the Gemini 2.5 family expansion.