DeepSeek & Nvidia - Davide vs Golia
New China's own ChatGPT shakes AI and chipmakers.
DeepSeek, a Chinese AI startup, has introduced its latest model, R1, which matches the performance of OpenAI's o1 at just 3% of the cost. This achievement is attributed to R1's reliance on pure reinforcement learning (RL), a departure from the traditional supervised fine-tuning methods commonly used in training large language models. By focusing on RL, DeepSeek has developed a model that excels in complex reasoning tasks, including mathematics and coding, without the extensive computational resources typically required. The open-source nature of R1 has garnered significant attention, with over 109,000 downloads on platforms like HuggingFace. This development challenges existing assumptions about the resources necessary for cutting-edge AI performance and offers enterprises a cost-effective alternative to proprietary models.
Nvidia shares plunged 17% on Monday, resulting in a market cap loss of close to US$ 600nm, the biggest drop ever for a U.S. company. The sell-off, which hit much of the U.S. tech sector, was sparked by concerns about increased competition from Chinese AI lab DeepSeek. Data center companies that rely on Nvidia chips also plummeted, with Dell, Oracle and Super Micro Computer all falling by at least 8.7%.
Nvidia commented on DeepSeek: "DeepSeek is an excellent AI advancement and a perfect example of Test Time Scaling. DeepSeek’s work illustrates how new models can be created using that technique, leveraging widely-available models and compute that is fully export control compliant. Inference requires significant numbers of NVIDIA GPUs and high-performance networking. We now have three scaling laws: pre-training and post-training, which continue, and new test-time scaling.”
Street comments
Bernstein (Stacy Rasgon): Believes the panic is overblown; maintains outperform ratings on Nvidia and Broadcom.
Cantor Fitzgerald: DeepSeek V3 is Actually Very Bullish for Compute and NVDA. "Following release of DeepSeek's V3 LLM, there has been great angst as to the impact for compute demand, and therefore, fears of peak spending on GPUs. We think this view is farthest from the truth and that the announcement is actually very bullish with AGI seemingly closer to reality and Jevons Paradox almost certainly leading to the AI industry wanting more compute, not less. We would be buyers of NVDA shares on any potential weakness."
UBS maintains a Buy rating on Nvidia, dismissing concerns about near-term "air pockets" and expecting strong FQ4 and FQ1 results, with Blackwell chip yields improving and rapidly replacing Hopper in the mix. They believe hardware issues, particularly with connector cartridges, are improving, and rack shipments are already underway with major suppliers Hon Hai and Quanta confirming volume production schedules. UBS projects Blackwell revenue at ~US$ 9bn in January quarter (up from previous US$ 5bn estimate), maintaining their overall FQ4 estimate of ~US$ 42bn (Data Center ~US$ 38bn) and FQ1 estimate of ~US$ 47bn, with their price target remaining at US$ 185.
BofA dismisses concerns about DeepSeek's claims of building an AI model for just US$ 6m that outperforms GPT-4, suggesting the model is likely "distilled" from larger foundation models like Meta's Llama rather than being truly independent. They point to Meta's planned 56% increase in 2025 capex to US$ 60-65bn as evidence that foundational AI models continue to require significant and rising infrastructure investments. BofA maintains their bullish outlook on AI semiconductor demand, driven by a mix of large foundation models, derivative models, and widespread inference needs, leading them to maintain Buy ratings on Nvidia, Broadcom, and Marvell Technology.
Mizuho advises against panic-selling Nvidia stock but remains cautious about buying dips, given the crowded AI trade. They're highly skeptical of Chinese company DeepSeek's claims about achieving low AI training costs (US$ 6m), questioning their legitimacy and timing amid US-China tech tensions. This market reaction shows how volatile AI-related stocks have become, shifting rapidly from excitement over US data center initiatives to panic selling based on unverified claims.
Wells Fargo also noted DeepSeek news is roiling data center stocks saying while it could create risks on longer-term hyperscale training capex, it also may accelerate inference demand growth and enterprise adoption of AI (though they said they doubt near-term hyperscale spend is at risk).
Citi (Atif Malik): “While the dominance of the US companies on the most advanced AI models could be potentially challenged, that said, we estimate that in an inevitably more restrictive environment, US’ access to more advanced chips is an advantage. Thus, we don’t expect leading AI companies would move away from more advanced GPUs which provide more attractive $/TFLOPs at scale. We see the recent AI capex announcements like Stargate as a nod to the need for advanced chips. Maintain Buy PT US$ 175”
DeepSeek could flatten nuclear energy demand. BMO says DeepSeek server power intensity may be 50%-75% below latest. Demand for U.S. nuclear energy could be decimated by China's AI breakthrough with DeepSeek, but it's too early to know for sure, BMO analyst Subash Chandra tells WSJ. Demand for nuclear power is expected to have a compound annual growth rate of 2% to 5% over the next 10 years, as data center projects to fuel AI growth in the U.S. look to cleaner energy sources, Chandra says. DeepSeek's reported energy efficiency could drag down demand to as low as 1% to 2% CAGR if its success is replicated and verified, Chandra says. S&P 500 tech sector index down 5.5%.
Wedbush: "No US Global 2000 company will use Chinese start-up DeepSeek for AI infrastructure, as Nvidia remains the dominant player in autonomous systems, robotics, and broader AI applications. While DeepSeek's LLM model is impressive, it lacks the scale and capacity to compete in AI infrastructure. Despite market fears and events like Fed actions or Nvidia delays, the tech rally, led by giants like Nvidia, Microsoft, and Amazon, will continue. DeepSeek's advancements, though notable, don't pose a significant threat to the established tech ecosystem."
Raymond James (Srini Pajjuri): Suggests that if DeepSeek's innovations are adopted, training costs could decrease, questioning the need for large GPU clusters.
TechCrunch: DeepSeek, the viral AI company, has released a new set of multimodal AI models that it claims can outperform OpenAI’s DALL-E 3. The models, which are available for download from the AI dev platform Hugging Face, are part of a new model family that DeepSeek is calling Janus-Pro. DeepSeek, a Chinese AI lab funded largely by the quantitative trading firm High-Flyer Capital Management, broke into the mainstream consciousness this week after its chatbot app rose to the top of the Apple App Store charts.
Deutsche Bank on NVDA: "So this is a company that has gone from relative earnings obscurity to one of the most profitable in the world inside two years and the largest company in the world as of Friday night. The problem is that the AI industry is embryonic. And it's almost impossible to know how it will develop or what competition current winners might face even if you fully believe in its potential to drive future productivity. The stratospheric rise of DeepSeek reminds us of this."
Yesterday/overnight news on DeepSeek
“Currently, only registration with a mainland China mobile phone number is supported,” DeepSeek says on its status page.
DeepSeek was hit by 'large-scale' cyber-attack, forced to limit registrations. Nvidia says DeepSeek's "excellent" advancements still require lots of its chips.
DeepSeek dropped another open-source AI model, Janus-Pro-7B. It's multimodal (can generate images) and beats OpenAI's DALL-E 3 and Stable Diffusion across GenEval and DPG-Bench benchmarks.
The CEO of Scale AI said that DeepSeek reportedly has up to 50,000 Nvidia H100s they don’t report, not the 10,000 A100s they claim, due to U.S. export controls. Musk, with experience from xAI, agrees with Wang's assessment.
Other victims in the yesterday’s sell-off: data center infrastructure names (VRT, ETN, PWR), server makers (SMCI, DELL), nuclear/power names (CEG, NRG, VST, TLN, SMR, OKLO) that are seen as needed to power the AI revolution.
Source: Ashenden, external various
DeepSeek is a bubble inside AI bubble
Good overview!