DeepSeek R1 0528

deepseek/deepseek-r1

DeepSeek R1 0528 is DeepSeek's open-source reasoning model, released January 20, 2025. It scores 79.8% Pass@1 on AIME 2024 and 97.3% on MATH-500. Weights ship under the MIT License for commercial use.

ReasoningImplicit Caching

import { streamText } from 'ai'

const result = streamText({
  model: 'deepseek/deepseek-r1',
  prompt: 'Why is the sky blue?'
})

Playground

Try out DeepSeek R1 0528 by DeepSeek. Usage is billed to your team at API rates. Free users get $5 of credits every 30 days, and you are considered a free user if you haven't made a payment.

About DeepSeek R1 0528

DeepSeek R1 0528 was released January 20, 2025 and breaks from conventional reasoning model training. Instead of relying on human-written reasoning traces, DeepSeek applied reinforcement learning directly to the base DeepSeek-V3 weights. Unconstrained RL let emergent behaviors like self-verification, self-reflection, and long chain-of-thought generation develop organically.

The architecture is a 671B Mixture-of-Experts (MoE) model that activates 37B parameters per forward pass. On AIME 2024, DeepSeek R1 0528 achieves 79.8% Pass@1, on par with OpenAI o1. On MATH-500 it reaches 97.3%. The release documentation also highlights strong code and general reasoning performance.

The MIT License is permissive: many proprietary reasoning models impose stricter restrictions. DeepSeek released six smaller derivatives alongside the full model. The 32B and 70B versions match OpenAI o1-mini performance, giving teams cost-efficient alternatives to the full 671B model.

Providers

The AI Gateway supports routing requests across multiple AI providers. You can control provider preferences using the provider slugs available for copying with the buttons below. For more see the AI Gateway provider options documentation. By using the AI provider you acknowledge you reviewed and agree to their terms listed in the Legal section under the AI provider's name.

Provider

Context	Max Output	Latency	Throughput	Input	Output	Cache	Image Gen	Video Gen	Web Search	Per Query	Capabilities	ZDR	No Training	HIPAA	Release Date

Legal:Terms

•

Privacy

160K

16K

0.6s

30tps

$0.50/M

$2.15/M

Read:$0.35/M

Write:—

—

01/20/2025

Legal:Terms

•

Privacy

128K

0.3s

158tps

$1.35/M

$5.40/M

—

01/20/2025

Metrics

Based exclusively on usage through AI Gateway.

Throughput24 hours

More models by DeepSeek

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.4s

37tps

$1.74/M

$3.48/M

Read:$0.14/M

Write:—

—

04/23/2026

1.1s

80tps

$0.14/M

$0.28/M

Read:$0.03/M

Write:—

—

04/23/2026

164K

0.6s

81tps

$0.28/M

$0.42/M

Read:$0.03/M

Write:—

—

12/01/2025

128K

1.2s

79tps

$0.28/M

$0.42/M

Read:$0.03/M

Write:—

—

12/01/2025

164K

0.2s

113tps

$0.50/M

$1.50/M

Read:$0.13/M

Write:—

—

08/21/2025

164K

0.5s

86tps

$0.77/M

Read:$0.14/M

Write:—

—

12/26/2024

What To Consider When Choosing a Provider

Zero Data Retention
AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

DeepSeek R1 0528 generates verbose reasoning traces before final answers. Budget output tokens generously and account for variable response length when estimating costs.

When to Use DeepSeek R1 0528

Best For

Competitive mathematics:
Formal proof construction and quantitative reasoning where AIME 2024 and MATH-500 benchmark results match your task
Code generation and debugging:
Algorithm design where RL-derived problem-solving patterns produce self-correcting chains before final output
Complex analytical reasoning:
Multi-step reasoning in finance, science, and engineering where showing work and self-verification build trust

Consider Alternatives When

Conversation or summarization:
Extended reasoning traces add unnecessary output token cost for content generation workloads
Hybrid thinking modes:
DeepSeek-V3.1 or later supports both thinking and non-thinking modes through the same endpoint
Strict latency requirements:
Variable response times from long reasoning chains are not acceptable when latency is a hard constraint
Pure creative writing:
Structured reasoning adds no quality benefit for open-ended generation tasks

Conclusion

DeepSeek R1 0528 matches closed-source models on published benchmarks while shipping weights under the MIT License. For math, code, and formal reasoning workloads, it fits teams that need open weights.

FAQ

DeepSeek applied reinforcement learning directly to the base model, bypassing the conventional step of training on human-written reasoning traces. Reasoning patterns like self-verification and reflection emerged from RL exploration rather than curated data.

79.8% Pass@1 on AIME 2024, on par with OpenAI o1 at release. On MATH-500 it scores 97.3%.

The MIT License permits commercial use. Many proprietary reasoning models impose stricter restrictions.

A context window of 160K tokens. The architecture is Mixture-of-Experts (MoE) with 671B total parameters, activating 37B per forward pass.

DeepSeek R1 0528 specializes in deep reasoning with extended chain-of-thought. DeepSeek-V3 and later variants are general-purpose models that balance reasoning with faster, lower-cost completions and suit mixed-workload deployments better.

Yes. The chain-of-thought trace appears in the response. This helps with debugging and with applications that display the model's reasoning to end users.

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

DeepSeek R1 0528

Playground

About DeepSeek R1 0528

Providers

More models by DeepSeek

What To Consider When Choosing a Provider

Zero Data Retention

Authentication

When to Use DeepSeek R1 0528

Best For

Competitive mathematics:

Code generation and debugging:

Complex analytical reasoning:

Consider Alternatives When

Conversation or summarization:

Hybrid thinking modes:

Strict latency requirements:

Pure creative writing:

Conclusion

FAQ

Playground

About DeepSeek R1 0528

Providers

More models by DeepSeek

About DeepSeek R1 0528

What To Consider When Choosing a Provider

Zero Data Retention

Authentication

When to Use DeepSeek R1 0528

Best For

Competitive mathematics:

Code generation and debugging:

Complex analytical reasoning:

Consider Alternatives When

Conversation or summarization:

Hybrid thinking modes:

Strict latency requirements:

Pure creative writing:

Conclusion

FAQ

How was DeepSeek R1 0528 trained differently from other reasoning models?

What are DeepSeek R1 0528's benchmark scores on mathematics?

What does the MIT License mean for using DeepSeek R1 0528 outputs commercially?

What is the context window and architecture of DeepSeek R1 0528?

When should I use DeepSeek R1 0528 versus DeepSeek-V3 or V3.1?

Does the reasoning trace appear in the API response?

About DeepSeek R1 0528