Underwhelming or Underrated? DeepSeek V4 Delivers Impressive Gains

DeepSeek’s V4 Model Falls Short of US and Domestic Rivals, But Shows Promise

DeepSeek’s latest flagship model, V4, has not met the expectations of its domestic and US competitors, according to a recent analysis. The Chinese artificial intelligence company is facing challenges in replicating the market success of its earlier release, R1.

The most advanced version of the V4 model, V4 Pro, ranked second among the world’s leading open-source models, behind Moonshot AI’s Kimi K2.6, as reported by benchmark firm Artificial Analysis. While V4 Pro showed clear improvements over its predecessor, V3.2, it still lagged behind top competitors.

V4 Pro scored 52 on the Artificial Analysis Intelligence Index, compared with 54 for Kimi K2.6, which was released earlier this week. In contrast, leading closed-source models from the US—such as OpenAI’s GPT-5.5, Anthropic’s Claude Opus, and Google’s Gemini 3.1 Pro—scored 60, 57, and 57, respectively.

This outcome highlights the challenges DeepSeek faces as China strives to close the AI gap with the US. Intensifying competition at home and abroad, along with ongoing constraints on computing power, are significant hurdles.

Despite these challenges, analysts noted that V4 delivered meaningful technical progress. Kyle Chan, a research fellow at the Brookings Institution, described the model as “impressive” for coming close to state-of-the-art performance. He highlighted features such as an efficient one-million-token context window and the ability to run on Huawei Technologies’ Ascend 950PR AI chips.

A context window refers to the amount of information an AI model can process in a single pass. DeepSeek’s previous flagship model had a context window of 128,000 tokens. The V4 model’s improvement in this area represents a significant step forward.

In a report on Saturday, research firm SemiAnalysis praised DeepSeek’s 90% reduction in KV cache in a one-million-token context setting, calling it “far more impactful than Google’s TurboQuant paper last month.”

The model is also notable for its compatibility with domestic hardware. Shortly after V4’s release on Friday, Huawei Technologies stated that its Ascend chip range and supernode systems would provide “full support” for running the model in inference.

However, questions remain about how the model was trained. Kyle Chan noted that DeepSeek made no mention of using Chinese chips during training, even as the model continued to trail US frontier systems. This absence of detail raised concerns about the model’s reliance on restricted Nvidia Blackwell chips.

Chris McGuire, a senior fellow at the Council on Foreign Relations, echoed these concerns, stating that the release did little to shift the broader picture of US leadership in AI. He estimated the US remained about seven months ahead, suggesting that the lack of details on training costs or hardware might indicate reliance on restricted Nvidia chips.

DeepSeek did not immediately respond to a request for comment.

Market reaction to the V4 release was more muted than during the debut of its earlier R1 reasoning model. While shares of Chinese chipmakers rallied on Friday following news of V4 and its integration with Huawei hardware, the release failed to trigger the kind of global shock seen last year.

When R1 launched, it wiped hundreds of billions of dollars from US equity markets, with Nvidia shares plunging 17% in a single day. On Friday, however, Nvidia stock rose 4.32%.

Artificial Analysis also flagged potential drawbacks in the new model. Despite gains in knowledge benchmarks, V4 Pro and its lighter V4 Flash variant recorded hallucination rates of 94% and 96%, respectively. The firm further noted that V4 Pro is now more expensive than rival open-source models, including Kimi K2.6 and Zhipu AI’s GLM-5.1, as well as DeepSeek’s own V3.2. Even so, it remained significantly cheaper than leading closed-source systems, according to the benchmark firm.

SemiAnalysis called DeepSeek’s V4 “an exceptional engineering release” that was “just behind” the frontier. While its capabilities were not at the leading edge, the firm suggested that the model could serve as a low-cost alternative to US closed-source systems.

Pos terkait