Seedance 2.0
Seedance 2.0 generates video with synchronized multilingual audio, professional camera work, multi-shot composition, and in-video text rendering. Inputs include text, image, multimodal reference, and existing video for editing and extension.
import { experimental_generateVideo as generateVideo } from 'ai';
const result = await generateVideo({ model: 'bytedance/seedance-2.0', prompt: 'A serene mountain lake at sunrise.'});What To Consider When Choosing a Provider
Zero Data Retention
AI Gateway does not currently support Zero Data Retention for this model. See the documentation for models that support ZDR.Authentication
AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
Seedance 2.0 accepts text, image, multimodal reference (image plus video plus audio), and existing video as input. If your pipeline handles multiple input modes, route through a single model rather than composing separate generation steps.
When to Use Seedance 2.0
Best For
Multimodal video workflows:
Text, image, video, and audio inputs combined in a single reference-to-video generation
Character-driven content:
Scenes with facial expressions, physical interactions, and synchronized dialogue in multiple languages
Cinematic production:
Professional camera movements and multi-shot composition that extend beyond typical social clip defaults
In-video text rendering:
Content where legible text inside the generated video matters for brand or narrative
Video editing and extension:
Modifying existing video or extending a source clip without regenerating from scratch
Consider Alternatives When
Maximum generation speed:
Seedance 2.0 Fast trades some quality for faster turnaround and lower cost
Static image generation:
Use a dedicated image model when motion isn't required
Video understanding only:
Use a vision-language model when you need to analyze existing video rather than generate new content
Conclusion
Seedance 2.0 consolidates second-generation Seedance capabilities into a single model: multimodal inputs, high-fidelity motion, native synchronized audio, professional camera work, and in-video text. For teams producing character-driven or cinematic short-form video, it's the quality-focused default in the 2.0 line.
FAQ
Text-to-video, image-to-video, multimodal reference-to-video (combining image, video, and audio inputs), and video editing and extension.
Yes. Audio generation is native and synchronized to the video, with multilingual support for dialogue, sound effects, and ambient audio. No separate text-to-speech or audio compositing step is required.
The standard variant targets the highest output quality in the 2.0 line. Seedance 2.0 Fast shares the same input types and capabilities but prioritizes speed and lower cost. Choose based on whether quality or turnaround matters more.
Yes. In-video text rendering is one of the new capabilities in the 2.0 line, useful for brand, narrative, and informational content.
Examples shown at launch use 720p at 16:9 aspect ratio, with clip durations from five to 10 seconds.
Zero Data Retention is not currently available for this model. ZDR on AI Gateway applies to direct gateway requests; BYOK flows aren't covered. See https://vercel.com/docs/ai-gateway/capabilities/zdr for details.
Current pricing is shown on this page. AI Gateway applies no markup on video generation, so the rate matches the direct ByteDance provider price.
Set the model to bytedance/seedance-2.0 and call it through the AI SDK's generateVideo function. AI Gateway handles authentication, retries, and failover across bytedance.