Skip to content
Vercel April 2026 security incident

Gemini 2.5 Flash Lite

google/gemini-2.5-flash-lite

Gemini 2.5 Flash Lite is the fastest and most affordable model in the Gemini 2.5 family, with configurable thinking, a context window of 1.0M tokens, and benchmark improvements over 2.0 Flash-Lite across coding, math, and science, at a price designed for high-throughput agentic pipelines.

File InputReasoningTool UseVision (Image)Web SearchImplicit Caching
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'google/gemini-2.5-flash-lite',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

Applications using the thinking feature should benchmark total token cost under realistic thinking budgets, as thinking tokens contribute to output costs.

When to Use Gemini 2.5 Flash Lite

Best For

  • High-volume agentic pipelines needing occasional reasoning:

    The thinking toggle allows selective deliberation on harder steps without paying full 2.5 Flash prices for every call in the pipeline

  • Migrating from 2.0 Flash-Lite:

    Benchmark improvements across coding and math mean the upgrade delivers measurable quality gains on common developer tasks at comparable cost

  • Latency-sensitive applications within the 2.5 family:

    When 2.5 Flash or 2.5 Pro latency is too high for the user experience, Flash-Lite provides 2.5-generation quality at the fastest 2.5 response times

  • Translation, classification, and data extraction at scale:

    Strong instruction following and fast response make it a reliable workhorse for structured-output production tasks

Consider Alternatives When

  • Maximum reasoning depth is required:

    2.5 Flash or 2.5 Pro with uncapped thinking budget is more appropriate for the most complex multi-step problems

  • Image generation is needed:

    Gemini 2.5 Flash Lite does not generate images. Gemini models with native image output are available in the 2.5 Flash Image and 3.x families

  • Your workload is pure annotation/extraction without reasoning:

    For text-output-only extraction at maximum cost efficiency, 2.0 Flash-Lite's lower price floor may be preferable

Conclusion

Gemini 2.5 Flash Lite closes the gap between 2.0 Flash-Lite and the full 2.5 Flash tier. It delivers better benchmark performance and thinking capability at the same latency profile teams already depend on. For 2.0 Flash-Lite users, it's the natural upgrade.

FAQ

Four levels: minimal, low, medium, and high. You set the level per request. Thinking tokens are added to output token count, so higher thinking levels increase both quality and cost.

Gemini 2.5 Flash Lite shows cross-category benchmark improvements in coding, mathematics, science, and reasoning over 2.0 Flash-Lite.

Yes, the model accepts multimodal inputs including images, audio, and documents alongside text, within the context window of 1.0M tokens.

Gemini 2.5 Flash Lite is the fastest and lowest-latency model in the 2.5 family. It provides 2.5-generation capability with first-token times lower than 2.5 Flash or 2.5 Pro.

When a subset of requests in a pipeline hits problems that need more deliberation, such as math problems, multi-step instructions, or ambiguous classification. Setting thinking to minimal for routine requests and medium for flagged hard ones keeps average cost low.

Use the identifier google/gemini-2.5-flash-lite with any supported interface. Set thinking level via provider options in the AI SDK or via request parameters in direct API calls.