Seedance 2.0 is BytePlus new-generation video model featuring superior visual fidelity, longer duration support, and advanced scene understanding for professional-grade content creation.
Pricing
- $0.0070 /K tokens (480p/720p, w/o video)
- $0.0043 /K tokens (480p/720p, w/ video)
- $0.0077 /K tokens (1080p, w/o video)
- $0.0047 /K tokens (1080p, w/ video)
Seedance 2.0 Fast is BytePlus fast-response video model built for lower-latency generation, delivering strong visual quality and efficient turnaround for high-throughput creative workflows.
Pricing
- $0.0056 /K tokens (w/o video)
- $0.0033 /K tokens (w/ video)
Seedance 1.5 is the latest video generation model launched by ByteDance🚀. Based on the functionalities Seedance 1.0 Pro functionalities, Seedance Pro 1.5 can automatically generate matching voices, sound effects, and background music based on text prompts and visual content.
Pricing
- $0.0024 /K tokens(w/ audio)
- $0.0012 /K tokens(w/o audio)
Seedance 1.0 Pro, the large-parameter version of the model suite, features a unique multi-shot storytelling capability and demonstrates outstanding performance across multiple dimensions. It achieves breakthroughs in semantic understanding and instruction-following, enabling the generation of 1080P high-definition videos with smooth motion, rich details, diverse styles, and cinematic visual quality.
Byteplus Seedance 1.0 Professional version, optimized for text-to-video conversion with high-quality video generation capabilities. Supports multiple style transformations, ideal for rapid prototyping and creative exploration. Lightweight design ensures fast response and efficient processing.
Byteplus Seedance 1.0 Lite version, optimized for image-to-video conversion with high-quality video generation capabilities. Supports multiple style transformations, ideal for rapid prototyping and creative exploration. Lightweight design ensures fast response and efficient processing.
BytePlus Seedream 5.0 Lite is the latest image generation model featuring enhanced detail rendering, up to 3K/4K resolution output, and improved prompt adherence for stunning visual creations.
BytePlus Seedream 4.5 is a New High-Aesthetic Image Generation Model with stronger spatial understanding, richer world knowledge, superior aesthetics, higher consistency, and smarter instruction following for precise visual creation.
Byteplus Seedream 4.0 is a SOTA multimodal image generation model, enabling text-to-image creation, image editing and multi-image generation within a single model, supports a wide range of creative scenarios.
Alibaba Cloud Wan 2.6 is an advanced video generation model that creates high-quality videos from text descriptions, supporting multiple resolutions including 720p and 1080p.
Pricing
- T2V 720p - $0.10 /seconds
- T2V 1080p - $0.15 /seconds
- I2V 720p(w/ audio) - $0.10 /seconds
- I2V 1080p(w/ audio) - $0.15 /seconds
- R2V 720p(w/ audio) - $0.10 /seconds
- R2V 1080p(w/ audio) - $0.15 /seconds
- R2V-Flash 720p(w/o audio) - $0.025 /seconds
- R2V-Flash 1080p(w/o audio) - $0.0375 /seconds
- R2V-Flash 720p(w/ audio) - $0.05 /seconds
- R2V-Flash 1080p(w/ audio) - $0.075 /seconds
Alibaba Cloud Wan 2.5 preview is a powerful video generation model that transforms text prompts into dynamic videos, supporting resolutions from 480p to 1080p.
Pricing
- T2V 480p - $0.05 /seconds
- T2V 720p - $0.10 /seconds
- T2V 1080p - $0.15 /seconds
- I2V 480p(w/ audio) - $0.05 /seconds
- I2V 720p(w/ audio) - $0.10 /seconds
- I2V 1080p(w/ audio) - $0.15 /seconds
Alibaba Cloud Qwen Image 2.0 Pro unified image generation and editing model with stronger text rendering, realism, and prompt adherence.
Alibaba Cloud Qwen Image 2.0 accelerated image generation and editing model balancing quality and response time.
Google Gemini 2.5 Flash model, a fast and efficient language model optimized for speed and cost-effectiveness.
Pricing
- Input - $0.30 /M tokens
- Cached input - $0.03 /M tokens
- Output - $2.50 /M tokens
Google Gemini 2.5 Flash model, a fast and efficient language model optimized for speed and cost-effectiveness.
Pricing
- Input - $0.10 /M tokens
- Cached input - $0.01 /M tokens
- Output - $0.40 /M tokens
Google Gemini 2.5 Pro model, offering advanced capabilities with enhanced reasoning and understanding.
Pricing
- Input - $1.25 /M tokens <= 200k Tokens
- Cached input - $0.125 /M tokens <= 200k Tokens
- Input - $2.50 /M tokens > 200k Tokens
- Output - $10.00 /M tokens <= 200k Tokens
- Output - $15.00 /M tokens > 200k Tokens
Google Gemini 2.5 Flash model, a fast and efficient language model optimized for speed and cost-effectiveness.
Pricing
- Input - $0.50 /M tokens
- Cached input - $0.05 /M tokens
- Output - $3.00 /M tokens
Google Gemini 3.1 Pro Preview model, the latest preview version with cutting-edge capabilities.
Pricing
- Input - $2.00 /M tokens <= 200k Tokens
- Cached input - $0.20 /M tokens <= 200k Tokens
- Input - $4.00 /M tokens > 200k Tokens
- Output - $12.00 /M tokens <= 200k Tokens
- Output - $18.00 /M tokens > 200k Tokens
Google Gemini 3.1 Flash-Lite Preview model, optimized for high-throughput lightweight tasks with low latency and cost efficiency.
Pricing
- Input (Text/Image/Video) - $0.25 /M tokens
- Input (Audio) - $0.50 /M tokens
- Output (incl. thinking tokens) - $1.50 /M tokens
- Cached input (Text/Image/Video) - $0.025 /M tokens
- Cached input (Audio) - $0.05 /M tokens
- Cache storage - $1.00 /hour /1M tokens
Google Gemini 3.5 Flash model, optimized for fast multimodal reasoning, search-grounded responses, and cost-efficient high-throughput workloads.
Pricing
- Input - $1.50 /M tokens
- Output (incl. thinking tokens) - $9.00 /M tokens
- Cached input - $0.15 /M tokens
- Cache storage - $1.00 /hour /1M tokens
Google's latest Veo 3.1 video generation model. Generate high-quality videos with synchronized speech/sound effects from a text prompt or reference image
Pricing
- 720p - $0.40 /seconds
- 1080p - $0.40 /seconds
- 4K - $0.60 /seconds
Google's latest Veo 3.1 video generation model. Generate videos with synchronized speech/sound effects from a text prompt or reference image faster
Pricing
- 720p - $0.15 /seconds
- 1080p - $0.15 /seconds
- 4K - $0.35 /seconds
(Nano Banana🍌) Google Gemini 2.5 Flash image model supporting multimodal text and image input/output. Can be used via OpenAI-compatible interface by adding modalities parameter. Lightweight and fast, ideal for real-time application scenarios requiring image understanding and generation.
(Nano Banana Pro🍌) Google Gemini 3 Pro image generation model with advanced multimodal capabilities. Features state-of-the-art image synthesis, enhanced prompt understanding, and superior visual quality. Supports creative image generation, editing, and style transfer with exceptional detail and accuracy.
Pricing
- 1K-2K(1024x1024px-2048x2048px) - $0.134 /pcs
- 4K(4096x4096px) - $0.24 /pcs
(Nano Banana 2🍌) is the latest state-of-the-art image model. Dramatically closes the gap between speed and visual fidelity, delivering high-quality, photorealistic imagery.
Pricing
- 512px -$0.045 /pcs
- 1K - $0.067 /pcs
- 2K - $0.101 /pcs
- 4K - $0.151 /pcs
OpenAI GPT Image 2 is the latest multimodal image generation and editing model, supporting both text and image inputs with high-fidelity image outputs. Delivers improved prompt understanding, better instruction following, and enhanced quality for creative, design, and production scenarios.
Pricing
- Image Input - $8.00 /M tokens
- Image Cached input - $2.00 /M tokens
- Image Output - $30.00 /M tokens
- Text Input - $5.00 /M tokens
- Text Cached input - $1.25 /M tokens
OpenAI's latest Sora 2 video generation model with significantly improved video quality and duration. Standard version offers an affordable entry point for developers and small businesses. Supports more complex scene understanding, more natural physics simulation, and more precise text-to-video conversion. Can generate up to 60 seconds of HD video, perfect for professional content creation.
Pricing
- Portrait 720×1280 - $0.10 /seconds
- Landscape 1280×720 - $0.10 /seconds
OpenAI's advanced Sora 2 Pro is a high-fidelity AI video generation model designed for professional content creators and commercial applications. Features significantly improved video quality, extended duration support, and enhanced scene understanding. Supports multiple aspect ratios (portrait and landscape) with customizable resolution and duration settings. Delivers superior video quality with more natural physics simulation and precise text-to-video conversion. Ideal for brand advertising, product showcases, educational content, and creative productions requiring professional-grade video output.
Pricing
- Portrait 720×1280 - $0.30 /seconds
- Landscape 1280×720 - $0.30 /seconds
- HD Portrait 1024×1792 - $0.50 /seconds
- HD Landscape 1792×1024 - $0.50 /seconds
xAI flagship model with state-of-the-art reasoning, coding, and complex task-solving capabilities.
Pricing
- Input - $3.00 /M tokens
- Cached input - $0.075 /M tokens
- Output - $15.00 /M tokens
Pinned Grok-4 snapshot model for workloads that require a stable versioned release.
Pricing
- Input - $3.00 /M tokens
- Cached input - $0.075 /M tokens
- Output - $15.00 /M tokens
Latest fast-optimized Grok model balancing speed and intelligence for efficient daily tasks.
Pricing
- Input - $0.20 /M tokens
- Cached input - $0.050 /M tokens
- Output - $0.50 /M tokens
Fast Grok 4.1 reasoning model optimized for low-cost reasoning-heavy tasks.
Pricing
- Input - $0.20 /M tokens
- Cached input - $0.050 /M tokens
- Output - $0.50 /M tokens
Fast Grok 4.1 non-reasoning model for high-throughput conversational workloads.
Pricing
- Input - $0.20 /M tokens
- Cached input - $0.050 /M tokens
- Output - $0.50 /M tokens
Speed-optimized version of Grok-4, ideal for high-throughput applications requiring quick responses.
Pricing
- Input - $0.20 /M tokens
- Cached input - $0.050 /M tokens
- Output - $0.50 /M tokens
Fast Grok-4 reasoning model optimized for cost-efficient reasoning workloads.
Pricing
- Input - $0.20 /M tokens
- Cached input - $0.050 /M tokens
- Output - $0.50 /M tokens
Fast Grok-4 non-reasoning model for low-latency text generation scenarios.
Pricing
- Input - $0.20 /M tokens
- Cached input - $0.050 /M tokens
- Output - $0.50 /M tokens
Powerful third-generation Grok with strong reasoning and broad knowledge capabilities.
Pricing
- Input - $3.00 /M tokens
- Cached input - $0.075 /M tokens
- Output - $15.00 /M tokens
Lightweight and cost-effective Grok variant for simpler tasks and budget-conscious applications.
Pricing
- Input - $0.30 /M tokens
- Cached input - $0.050 /M tokens
- Output - $0.50 /M tokens
Kling AI video generation model, optimized for content creation. Supports text-to-video, image-to-video, and multiple generation modes. Features excellent character motion understanding and scene coherence, ideal for short video creation, social media content, and marketing video production.
Kling O1 Video is an advanced video generation model featuring the Omni-Video capability. Supports text-to-video, image-to-video, and video editing with reference images. Offers multiple durations (5-10s), resolutions up to 1080p, and various aspect ratios. Ideal for professional video content creation.
Pricing
- O1 std(w/o ref video) - $0.0857 /seconds
- O1 std(w/ ref video) - $0.1286 /seconds
- O1 pro(w/o ref video) - $0.1143 /seconds
- O1 pro(w/ ref video) - $0.1714 /seconds
Kling O1 Image is a high-quality AI image generation model. Supports text-to-image with multiple aspect ratios, resolutions (1K/2K), and batch generation up to 9 images. Features excellent prompt understanding and photorealistic output quality for creative and commercial applications.
Kling AI platform providing Omni-Image and Omni-Video generation with multi-subject composition, reference assets, and high-quality media creation.
Pricing
- v3 Omni-image 1K/2K - $0.0286 /pcs
- v3 Omni-image 4K - $0.0571 /pcs
- v3 Omni-video std(w/o ref, no audio) - $0.0857 /seconds
- v3 Omni-video std(w/o ref, w/ audio) - $0.1143 /seconds
- v3 Omni-video std(w/ ref, no audio) - $0.1286 /seconds
- v3 Omni-video std(w/ ref, w/ audio) - $0.1286 /seconds
- v3 Omni-video pro(w/o ref, no audio) - $0.1143 /seconds
- v3 Omni-video pro(w/o ref, w/ audio) - $0.1429 /seconds
- v3 Omni-video pro(w/ ref, no audio) - $0.1714 /seconds
- v3 Omni-video pro(w/ ref, w/ audio) - $0.1714 /seconds
MiniMax Hailuo AI video generation model, renowned for natural fluid motion and precise scene understanding. Supports various video styles from realistic to cartoon, static to dynamic. Built-in intelligent scene analysis automatically optimizes video pacing and transition effects.
MiniMax-M2.5 is a general-purpose language model designed for balanced quality, speed, and cost efficiency.
Pricing
- Input - $0.30 /M tokens
- Cached input - $0.03 /M tokens
- Output - $1.20 /M tokens
GLM-5.1 is Zhipu's latest flagship model for chat and agentic tasks.
Pricing
- Input - $1 /M tokens
- Cached Input - $0.2 /M tokens
- Output - $3.2 /M tokens
GLM-4.6v is a cost-effective GLM model supporting chat completion.
Pricing
- Input - $0.3 /M tokens
- Cached Input - $0.05 /M tokens
- Output - $0.9 /M tokens
GLM is Zhipu's flagship model line for complex dialogue and agent tasks. It supports thinking mode, native tool calling, and MCP integration, with up to 128K context for long conversations and multi-step reasoning workflows.
Pricing
- Input [0, 32k) - $0.28 /M tokens
- Output [0, 0.2k) - $1.12 /M tokens
- Input [0, 32k) - $0.42 /M tokens
- Output [0.2k, ∞) - $1.96 /M tokens
- Input [32k, 200k) - $0.56 /M tokens
- Output [0.2k, ∞) - $2.24 /M tokens
- Cached input - $0.11 /M tokens
DeepSeek-V4-Flash supports both non-thinking and thinking modes. Context window: 1M. Max output: 384K. Fast and cost-effective for daily tasks and agent workflows.
Pricing
- Input (cache hit) - $0.0028 /M tokens
- Input (cache miss) - $0.14 /M tokens
- Output - $0.28 /M tokens
DeepSeek-V4-Pro is the most powerful DeepSeek model with advanced reasoning capabilities. Context window: 1M. Max output: 384K. Ideal for complex reasoning and agent tasks.
Pricing
- Input (cache hit) - $0.0145 /M tokens
- Input (cache miss) - $1.74 /M tokens
- Output - $3.48 /M tokens
DeepSeek-V3.2 non-thinking mode for fast chat and standard tool usage. Context window: 128K. Output: default 4K, max 8K.
Pricing
- Input (cache hit) - $0.028 /M tokens
- Input (cache miss) - $0.28 /M tokens
- Output - $0.42 /M tokens
DeepSeek-V3.2 thinking mode (deepseek-reasoner) for complex reasoning and agent tasks. Context window: 128K. Output: default 32K, max 64K.
Pricing
- Input (cache hit) - $0.028 /M tokens
- Input (cache miss) - $0.28 /M tokens
- Output - $0.42 /M tokens
Doubao Seed 1.6 Thinking version, optimized for analytical tasks. Features powerful logical reasoning and problem decomposition capabilities, displaying detailed thought processes. Ideal for education, academic research, and business analysis requiring transparent reasoning.
Pricing
- Input [0,32K] - CNY 0.80 /M tokens
- Output [0,32K] - CNY 8.00 /M tokens
- Input (32,128K] - CNY 1.20 /M tokens
- Output (32,128K] - CNY 16.00 /M tokens
- Input (128,256K] - CNY 2.40 /M tokens
- Output (128,256K] - CNY 24.00 /M tokens
- Cache storage - CNY 0.017 /M tokens/hour
- Cache input - CNY 0.16 /M tokens
Seed 1.8 general-purpose version by ByteDance Seed team, a balanced language model. Performs well across dialogue, writing, translation, and summarization tasks. Deeply optimized for Chinese scenarios, understanding Chinese context and cultural background, suitable for daily use.
Pricing
- Input(0-128k) - $0.25 /M tokens
- Input(128k-256k) - $0.50 /M tokens
- Cache Hit - $0.05 /M tokens
- Output(0-128k) - $2.00 /M tokens
- Output(128k-256k) - $4.00 /M tokens
- Cache Storage - $0.0083 /M tokens/hour
Pinned Seed-1.8 snapshot dated 251228, a BytePlus language model optimized for Chinese context and general-purpose tasks.
Pricing
- Prompt [0,128K] - $0.25 /M tokens
- Cache-hit - $0.05 /M tokens
- Output - $2.00 /M tokens
Pinned Doubao Seed-1.8 snapshot dated 251228. ByteDance's Doubao version of the Seed-1.8 model for Chinese dialogue and text generation tasks.
Pricing
- Prompt [0,128K] - $0.25 /M tokens
- Cache-hit - $0.05 /M tokens
- Output - $2.00 /M tokens
Claude Opus 4.8 is Anthropic's most powerful model, offering exceptional reasoning capabilities, deep knowledge understanding, and nuanced language processing. Designed for the most demanding tasks requiring complex analysis, creative writing, and sophisticated problem-solving.
Pricing
- Input - $5.00 /M tokens
- Cache writes(5m) - $6.25 /M tokens
- Cache writes(1h) - $10.00 /M tokens
- Cache hits & refreshes - $0.50 /M tokens
- Output - $25.00 /M tokens
Claude Opus 4.7 is Anthropic's most powerful model, offering exceptional reasoning capabilities, deep knowledge understanding, and nuanced language processing. Designed for the most demanding tasks requiring complex analysis, creative writing, and sophisticated problem-solving.
Pricing
- Input - $5.00 /M tokens
- Cache writes(5m) - $6.25 /M tokens
- Cache writes(1h) - $10.00 /M tokens
- Cache hits & refreshes - $0.50 /M tokens
- Output - $25.00 /M tokens
Claude Sonnet 4.5 is Anthropic's latest flagship model, achieving world-leading performance in AI agents, programming, and computer usage. Features enhanced knowledge base and exceptional long-text processing capabilities. Particularly suitable for complex long-term tasks, code review, and technical documentation. Excels in accuracy and attention to detail.
Pricing
- Input - $3.00 /M tokens
- Cache writes(5m) - $3.75 /M tokens
- Cache writes(1h) - $6.00 /M tokens
- Cache hits & refreshes - $0.30 /M tokens
- Output - $15.00 /M tokens
Claude Sonnet 4.6 is Anthropic's latest flagship model, achieving world-leading performance in AI agents, programming, and computer usage. Features enhanced knowledge base and exceptional long-text processing capabilities. Particularly suitable for complex long-term tasks, code review, and technical documentation. Excels in accuracy and attention to detail.
Pricing
- Input - $3.00 /M tokens
- Cache writes(5m) - $3.75 /M tokens
- Cache writes(1h) - $6.00 /M tokens
- Cache hits & refreshes - $0.30 /M tokens
- Output - $15.00 /M tokens
Claude Haiku 4.5 is a fast, affordable, and highly capable AI model excelling at programming and agentic tasks. Perfectly combines speed with low cost, ideal for real-time applications, large-scale deployments, and scenarios requiring rapid responses. Provides industry-leading cost-effectiveness while maintaining high-quality output.
Pricing
- Input - $1.00 /M tokens
- Cache writes(5m) - $1.25 /M tokens
- Cache writes(1h) - $2.00 /M tokens
- Cache hits & refreshes - $0.10 /M tokens
- Output - $5.00 /M tokens
Claude Opus 4.5 is Anthropic's most powerful model, offering exceptional reasoning capabilities, deep knowledge understanding, and nuanced language processing. Designed for the most demanding tasks requiring complex analysis, creative writing, and sophisticated problem-solving.
Pricing
- Input - $5.00 /M tokens
- Cache writes(5m) - $6.25 /M tokens
- Cache writes(1h) - $10.00 /M tokens
- Cache hits & refreshes - $0.50 /M tokens
- Output - $25.00 /M tokens
Claude Opus 4.1 delivers premium intelligence with advanced reasoning and analysis capabilities. Excels at complex research tasks, detailed content creation, and nuanced decision-making requiring deep contextual understanding.
Pricing
- Input - $15.00 /M tokens
- Cache writes(5m) - $18.75 /M tokens
- Cache writes(1h) - $30.00 /M tokens
- Cache hits & refreshes - $1.50 /M tokens
- Output - $75.00 /M tokens
Claude Sonnet 4 provides an excellent balance of intelligence and speed. Features strong coding capabilities, nuanced understanding, and efficient processing for everyday tasks and complex workflows alike.
Pricing
- Input - $3.00 /M tokens
- Cache writes(5m) - $3.75 /M tokens
- Cache writes(1h) - $6.00 /M tokens
- Cache hits & refreshes - $0.30 /M tokens
- Output - $15.00 /M tokens
Claude Opus 4 is a fast, affordable, and highly capable AI model excelling at programming and agentic tasks. Perfectly combines speed with low cost, ideal for real-time applications, large-scale deployments, and scenarios requiring rapid responses. Provides industry-leading cost-effectiveness while maintaining high-quality output.
Pricing
- Input - $1.00 /M tokens
- Cache writes(5m) - $1.25 /M tokens
- Cache writes(1h) - $2.00 /M tokens
- Cache hits & refreshes - $0.10 /M tokens
- Output - $5.00 /M tokens
Claude Haiku 3.5 offers impressive speed and cost-efficiency while maintaining strong capabilities. Perfect for high-volume applications, customer service, and tasks requiring quick responses with reliable quality.
Pricing
- Input - $0.80 /M tokens
- Cache writes(5m) - $1.00 /M tokens
- Cache writes(1h) - $1.60 /M tokens
- Cache hits & refreshes - $0.08 /M tokens
- Output - $4.00 /M tokens
Claude Haiku 3 is the most affordable Claude model, providing fast and efficient responses for simple tasks. Ideal for high-volume, cost-sensitive applications where speed is prioritized over complexity.
Pricing
- Input - $0.25 /M tokens
- Cache writes(5m) - $0.30 /M tokens
- Cache writes(1h) - $0.50 /M tokens
- Cache hits & refreshes - $0.03 /M tokens
- Output - $1.25 /M tokens
Claude Opus 4.6 extended thinking model with visible reasoning chains, ideal for deep analytical and complex problem-solving tasks.
Pricing
- Input - $5.00 /M tokens
- Output - $25.00 /M tokens
Claude Sonnet 4.6 extended thinking variant with transparent reasoning output, combining Sonnet-class speed with deep inferential ability.
Pricing
- Base Input - $15.00 /M tokens
- 5m Cache Writes - $18.75 /M tokens
- 1h Cache Writes - $30.00 /M tokens
- Cache Hits - $1.50 /M tokens
- Output - $75.00 /M tokens
Pinned Claude Opus 4.1 extended thinking snapshot dated 2025/08/05 for stable, versioned production deployments requiring deep reasoning.
Pricing
- Base Input - $15.00 /M tokens
- 5m Cache Writes - $18.75 /M tokens
- 1h Cache Writes - $30.00 /M tokens
- Cache Hits - $1.50 /M tokens
- Output - $75.00 /M tokens
Pinned Claude Opus 4 extended thinking snapshot dated 2025/05/14, offering deep reasoning with a stable model version.
Pricing
- Base Input - $15.00 /M tokens
- 5m Cache Writes - $18.75 /M tokens
- 1h Cache Writes - $30.00 /M tokens
- Cache Hits - $1.50 /M tokens
- Output - $75.00 /M tokens
Pinned Claude Opus 4.1 snapshot dated 2025/08/05 for workloads requiring a stable, versioned Opus release.
Pricing
- Base Input - $5.00 /M tokens
- 5m Cache Writes - $6.25 /M tokens
- 1h Cache Writes - $10.00 /M tokens
- Cache Hits - $0.50 /M tokens
- Output - $25.00 /M tokens
Pinned Claude Opus 4 snapshot dated 2025/05/14 for stable, versioned production deployments.
Pricing
- Base Input - $15.00 /M tokens
- 5m Cache Writes - $18.75 /M tokens
- 1h Cache Writes - $30.00 /M tokens
- Cache Hits - $1.50 /M tokens
- Output - $75.00 /M tokens
Pinned Claude Opus 4.5 snapshot dated 2025/11/01 for stable production deployments.
Pricing
- Base Input - $5.00 /M tokens
- 5m Cache Writes - $6.25 /M tokens
- 1h Cache Writes - $10.00 /M tokens
- Cache Hits - $0.50 /M tokens
- Output - $25.00 /M tokens
Pinned Claude Sonnet 4.5 snapshot dated 2025/09/29 for stable, versioned production deployments.
Pricing
- Base Input - $3.00 /M tokens
- 5m Cache Writes - $3.75 /M tokens
- 1h Cache Writes - $6.00 /M tokens
- Cache Hits - $0.30 /M tokens
- Output - $15.00 /M tokens
Pinned Claude Sonnet 4 snapshot dated 2025/05/14 for stable, versioned production deployments.
Pricing
- Base Input - $3.00 /M tokens
- 5m Cache Writes - $3.75 /M tokens
- 1h Cache Writes - $6.00 /M tokens
- Cache Hits - $0.30 /M tokens
- Output - $15.00 /M tokens
Pinned Claude Haiku 4.5 snapshot dated 2025/10/01 for stable, versioned production deployments.
Pricing
- Base Input - $0.80 /M tokens
- 5m Cache Writes - $1.00 /M tokens
- 1h Cache Writes - $1.60 /M tokens
- Cache Hits - $0.08 /M tokens
- Output - $4.00 /M tokens
Pinned Claude 3.5 Haiku snapshot dated 2024/10/22, offering fast and affordable performance for high-volume tasks.
Pricing
- Base Input - $3.00 /M tokens
- 5m Cache Writes - $3.75 /M tokens
- 1h Cache Writes - $6.00 /M tokens
- Cache Hits - $0.30 /M tokens
- Output - $15.00 /M tokens
OpenAI GPT-4.1 model with enhanced capabilities and improved performance.
Pricing
- Input - $2.00 /M tokens
- Cached input - $0.50 /M tokens
- Output - $8.00 /M tokens
OpenAI's latest flagship model GPT-5, achieving cross-domain breakthroughs in programming, reasoning, and AI agent tasks. Features stronger understanding capabilities, more accurate reasoning processes, and more natural interaction experiences. Supports complex multi-step task planning and execution, representing the current highest level of large language models.
Pricing
- Input - $1.25 /M tokens
- Cached input - $0.125 /M tokens
- Output - $10.00 /M tokens
OpenAI GPT-5-mini is a lower-cost variant optimized for lightweight text workloads and high-throughput applications.
Pricing
- Input - $0.25 /M tokens
- Cached input - $0.025 /M tokens
- Output - $2.00 /M tokens
OpenAI GPT-5.1 model with enhanced capabilities and improved performance over GPT-5.
Pricing
- Input - $1.25 /M tokens
- Cached input - $0.125 /M tokens
- Output - $10.00 /M tokens
OpenAI GPT-5.2 model with enhanced capabilities.
Pricing
- Input - $1.75 /M tokens
- Cached input - $0.175 /M tokens
- Output - $14.00 /M tokens
OpenAI GPT-5.3-chat is a chat-optimized GPT-5.3 variant for conversational and assistant-style applications.
Pricing
- Input - $1.75 /M tokens
- Cached input - $0.175 /M tokens
- Output - $14.00 /M tokens
OpenAI GPT-5-3-chat model variant for conversational and assistant-style applications.
Pricing
- Input - $1.75 /M tokens
- Cached input - $0.175 /M tokens
- Output - $14.00 /M tokens
OpenAI GPT-5.3-codex model optimized for coding tasks.
Pricing
- Input - $1.75 /M tokens
- Cached input - $0.175 /M tokens
- Output - $14.00 /M tokens
OpenAI GPT-5.4 model with enhanced capabilities and improved performance.
Pricing
- Input - $2.50 /M tokens
- Cached input - $0.25 /M tokens
- Output - $15.00 /M tokens
OpenAI GPT-5.4-mini model balancing quality and efficiency for cost-sensitive chat workloads.
Pricing
- Input - $0.75 /M tokens
- Cached input - $0.075 /M tokens
- Output - $4.50 /M tokens
OpenAI GPT-5.4-nano ultra-low-cost model for lightweight, high-throughput chat scenarios.
Pricing
- Input - $0.20 /M tokens
- Cached input - $0.02 /M tokens
- Output - $1.25 /M tokens
OpenAI GPT-4o model optimized for multimodal tasks with improved speed and efficiency.
Pricing
- Input - $2.50 /M tokens
- Cached input - $1.25 /M tokens
- Output - $10.00 /M tokens
OpenAI GPT-5.5 model for premium long-context and advanced reasoning workloads.
Pricing
- Input - $5.00 /M tokens
- Cached input - $0.50 /M tokens
- Output - $30.00 /M tokens
FlashVSR is a fast, high-quality video upscaler that boosts resolution and restores clarity for low-resolution or blurry footage. Ready-to-use REST inference API, best performance, no coldstarts, affordable pricing.
Pricing
- 4K - $0.16 /5s
- 2K - $0.12 /5s
- 1080p - $0.09 /5s
- 720p - $0.06 /5s
Kimi-k2.5 is a next-generation flagship model powered by reinforcement learning technology, renowned for its exceptional logical reasoning, autonomous thinking capabilities, and high instruction-following accuracy. Features native chain-of-thought (CoT) output with transparent reasoning via the reasoning_content field, enabling superior performance on complex tasks.
Pricing
- Prompt Input - $0.571429 /M tokens
- Cache-hit Input - $0.10 /M tokens
- Output - $3.00 /M tokens
MiMo-V2.5-Pro is Xiaomi's latest native omnimodal model with 310B total parameters (15B active), supporting text, image, video, and audio understanding within a unified Sparse MoE architecture. Built upon MiMo-V2-Flash's hybrid sliding-window attention backbone, augmented with a 729M-param Vision Transformer and a dedicated audio encoder. It surpasses MiMo-V2-Pro in agentic performance and supports up to 1M tokens context window.
Pricing
- Input (cache hit) - $0.0036 /M tokens
- Input (cache miss) - $0.435 /M tokens
- Output - $0.87 /M tokens
MiMo-V2.5 is Xiaomi's native omnimodal model supporting text, image, video, and audio understanding within a unified Sparse MoE architecture. Built on a hybrid sliding-window attention backbone with vision and audio encoders, designed for cost-efficient multimodal inference.
Pricing
- Input (cache hit) - $0.0028 /M tokens
- Input (cache miss) - $0.14 /M tokens
- Output - $0.28 /M tokens
MiMo-V2-Pro is Xiaomi's text large language model in the MiMo-V2 series, tuned for strong agentic performance, complex reasoning and instruction following.
Pricing
- Input (cache hit) - $0.0036 /M tokens
- Input (cache miss) - $0.435 /M tokens
- Output - $0.87 /M tokens
MiMo-V2-Omni is Xiaomi's omnimodal model in the MiMo-V2 series, supporting unified text, image, video, and audio understanding for general multimodal tasks.
Pricing
- Input (cache hit) - $0.0028 /M tokens
- Input (cache miss) - $0.14 /M tokens
- Output - $0.28 /M tokens
MiMo-V2-Flash is Xiaomi's lightweight, low-latency text model in the MiMo-V2 series, optimized for high-throughput and cost-sensitive scenarios.
Pricing
- Input (cache hit) - $0.01 /M tokens
- Input (cache miss) - $0.10 /M tokens
- Output - $0.30 /M tokens