Gemini 3.1 Flash Lite Preview
Gemini 3.1 Flash Lite Preview is the efficiency-focused model in the Gemini 3.1 generation for budget-constrained, high-volume workloads, with notable gains in translation, data extraction, and code completion over Gemini 2.5 Flash Lite and four configurable thinking levels.
import { streamText } from 'ai'
const result = streamText({ model: 'google/gemini-3.1-flash-lite-preview', prompt: 'Why is the sky blue?'})What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
Gemini 3.1 Flash Lite Preview supports four thinking levels, minimal, low, medium, and high, giving you fine-grained control over the reasoning-to-cost tradeoff across different task types within the same deployment.
When to Use Gemini 3.1 Flash Lite Preview
Best For
High-volume data extraction:
Pipelines processing millions of structured or semi-structured inputs
Bulk translation workloads:
Per-character cost directly determines operational feasibility
Code completion and review:
Inline suggestions and automated review at IDE or CI/CD pipeline scale
Parallel sub-agent systems:
Agentic architectures where aggregate token cost is a binding constraint
Latency-first applications:
Task complexity is moderate to low and response speed is the primary criterion
Consider Alternatives When
Lite-tier quality insufficient:
Your task quality cannot tolerate the tradeoffs (consider
google/gemini-3.1-flash-image-previeworgoogle/gemini-3-flashfor higher quality)Multi-step reasoning:
Your workflow involves complex documents or images (consider
google/gemini-3.1-pro-previeworgoogle/gemini-3-pro-preview)Native image generation:
You need image output alongside text (consider
google/gemini-3.1-flash-image-preview)Maximum reasoning depth:
The task requires deepest reasoning without cost constraints (consider
google/gemini-3.1-pro-preview)
Conclusion
Gemini 3.1 Flash Lite Preview brings the quality advances of the 3.1 generation to the highest-volume, most cost-sensitive segment of the model market. With four thinking levels and documented improvements in the task categories that drive the most tokens in production, it's the natural choice for teams optimizing for scale economics rather than peak capability.
FAQ
minimal, low, medium, and high. Lower levels reduce the amount of reasoning compute applied before generating a response, which decreases latency and token consumption but may reduce quality on complex tasks. high applies the most reasoning, similar to configuring a reasoning model for thorough inference.
Translation, data extraction, and code completion saw the largest improvements over Gemini 2.5 Flash Lite. These are the high-volume task categories where the efficiency gains of the 3.1 generation have the most practical impact.
Yes. High-volume agentic tasks are a primary target. The model's low cost and configurable thinking levels make it appropriate for sub-agents in hierarchical agent systems.
Yes. You set thinkingLevel per request in providerOptions.google.thinkingConfig, so different request types within the same application can use different levels without any architectural changes.
Yes. Use streamText from the AI SDK with model: 'google/gemini-3.1-flash-lite-preview' to stream responses.
Gemini 3 Flash prioritizes pro-grade reasoning at flash speed and is positioned as the standard speed/quality balance point. Flash Lite is specifically optimized for maximum cost efficiency and high volume throughput, trading some capability headroom for a lower price point.
Yes. Set includeThoughts: true in providerOptions.google.thinkingConfig to stream the model's reasoning tokens alongside the generated response.
Start with low or minimal for straightforward translation tasks where throughput is the primary concern. Increase to medium for content requiring cultural nuance or domain-specific accuracy.