The Models & Pricing page on DeepSeek’s API documentation now lists a limited-time, 75% discount on DeepSeek-V4-Pro token rates — with an explicit expiry in UTC. Under the same table, deepseek-v4-pro and deepseek-v4-flash still show 1M context, 384K max output, and the usual cache hit / cache miss split for input billing. If you are wiring agents, a parallel set of host-specific minimum versions and a 1M-prefixed model id is circulating in integration notes — treat those as a checklist, not a warranty, until you confirm in each tool.

Direct answer: As of the official pricing table, deepseek-v4-pro is offered at a stated 75% reduction from list rates on 1M input tokens (cache hit and miss) and 1M output tokens, valid through 5 May 2026, 15:59 UTC (DeepSeek’s own footnote on the same page). deepseek-v4-flash continues at the non-promotional row shown beside it. Expense is still tokens × price from balance. Re-read the live table before you freeze a budget; this file is a dated snapshot for orientation.

The Pro promo window (what the docs state)

DeepSeek labels the Pro line as a limited-time 75% discount versus the list rates shown with strikethrough styling on Models & Pricing, and it ties that promotion to a hard end: 5 May 2026, 15:59 UTC. If you are price-sensitive, that date is a schedule risk: after it, the UI may return to the higher list row unless DeepSeek extends or replaces the offer — which only the live page can confirm.

The doc also notes that product prices may change and reserves the right to adjust them; the rational move is to re-open the same URL before each planning cycle, not to forward-port a screenshot from a blog.

Per-1M-token table (Flash vs Pro)

Figures below mirror the two-column layout for deepseek-v4-flash and deepseek-v4-pro on the official page: all amounts are USD per one million tokens, with input split into cache hit and cache miss. For deepseek-v4-pro, the current cells reflect the discounted line; the struck amounts are the list values printed on the same table for comparison.

Price component deepseek-v4-flash deepseek-v4-pro (promo vs list)
1M input tokens (cache hit) $0.028 $0.03625 $0.145
1M input tokens (cache miss) $0.14 $0.435 $1.74
1M output tokens $0.28 $0.87 $3.48

Footnote on the vendor page: the deepseek-v4-pro 75% discount is valid through 5 May 2026, 15:59 UTC. Flash is not described with the same promo line in that table as of the snapshot this article is based on — always confirm both columns on Models & Pricing.

Shared API surface callouts on the same documentation include https://api.deepseek.com for OpenAI-style calls and https://api.deepseek.com/anthropic for Anthropic-format calls; context length 1M; max output 384K tokens; thinking mode (with a dedicated guide) for the behaviour you want; JSON output and tool calls marked for both. FIM completion (beta) remains non-thinking mode only. Legacy IDs deepseek-chat and deepseek-reasoner are flagged for future deprecation, with compatibility mapping to deepseek-v4-flash modes in the same doc block.

Integration checklist (agents and 1M context)

DeepSeek whale logo inside a glowing browser window on a dark blue tech background, suggesting API and third-party app integrations.

DeepSeek’s own V4 story already named Claude Code, OpenCode, and OpenClaw as surfaces where the stack is meant to line up. Separately, integration maintainers and community notes are pointing to a few concrete settings for people who want the 1M tier on the API through those hosts. These are operational hints — not sourced to DeepSeek’s pricing file — so verify the exact UI label and build number in each product before you rely on them in production:

  • Claude Code: set the model to deepseek-v4-pro[1m] to select the 1M-context profile that pairs with the API’s 1M window, then confirm the host still exposes that id on your build.
  • OpenCode: use v1.14.24 or newer so you pick up the compatibility path those releases target.
  • OpenClaw: use v2026.4.24 or newer on the same logic.

If you are new to the agent, how to use Skills in Claude Code (2026) is the on-site anchor for the broader workflow. For a local DeepSeek run without cloud billing, Run DeepSeek-V4 on Apple Silicon with mlx-lm walks install trade-offs.

Why cache hit and cache miss still matter

A headline 75% off on Pro does not remove the two input prices. Repeated, stable prefixes (good cache locality) can land on the cheaper cache-hit path; all-new context on a miss is billed at the miss rate. Your real blend depends on prompt design and whatever caching rules DeepSeek’s platform applies- which is why vendor docs still push metered tests instead of napkin math.

Udemy course art: DeepSeek and local AI / MLX-themed promotional graphic for skill-building.
Udemy

DeepSeek and related AI skills (course bundle)

Editor-selected DeepSeek-focused and adjacent AI courses on Udemy (listing titles, module maps, and price change often). Open the page, read the outline, and confirm the syllabus matches your stack before you purchase.

View courses on Udemy

Frequently asked questions

When does the 75% DeepSeek-V4-Pro discount end?

Per the footnote on Models & Pricing, the listed Pro discount is valid through 5 May 2026, 15:59 UTC. Re-check the same page for any extension or replacement offer.

Is deepseek-v4-flash 75% off the same way?

The promo footnote in the same vendor table is tied to deepseek-v4-pro list pricing. Flash is shown in its own column with the single set of three rates — read the live two-column table rather than inferring a discount row that may not exist.

What is deepseek-v4-pro[1m] in Claude Code?

It is a convenience name in that host for pointing the agent at the 1M API context tier for the Pro line. Exact spelling, brackets, and availability can change with product updates; confirm in the host’s own documentation or model picker for your build date.

Do I pay only the discounted Pro rate for every input token?

Input is still split into cache hit and cache miss components at the discounted per-million rates. Output is a separate per million output line. Total spend remains sum of tokens in each category × the active price for that column.

Set a calendar alert a few days before the UTC end time if you are running a burn-in or benchmark window — surprise renewals to list pricing have a way of landing on finance reviews at the wrong moment.