DeepSeek-V3: Challenging the "Brute-Force Computing" Paradigm and Ushering in a New Era for AI
DeepSeek-V3: Challenging the "Brute-Force Computing" Paradigm and Ushering in a New Era for AIIn today's fiercely competitive AI large language model (LLM) market, the evaluation of a model's quality is often oversimplified to "computing power investment," sometimes even directly measured by the "number of NVIDIA GPUs." This isn't entirely unreasonable, as almost all current LLMs, both open-source and closed-source, are based on Google's 2017 Transformer model
DeepSeek-V3: Challenging the "Brute-Force Computing" Paradigm and Ushering in a New Era for AI
In today's fiercely competitive AI large language model (LLM) market, the evaluation of a model's quality is often oversimplified to "computing power investment," sometimes even directly measured by the "number of NVIDIA GPUs." This isn't entirely unreasonable, as almost all current LLMs, both open-source and closed-source, are based on Google's 2017 Transformer model. From Tesla's FSD to OpenAI's ChatGPT, they all represent product applications of the Transformer model. As the movie "The Shadow Play" puts it, "They were all taught by the same master, so they can't break each other's moves." The core difference in current LLM competition lies in the amount of training data and the scale of computing powerthe accumulation of "experience."
Training GPT-4, OpenAI reportedly used 25,000 NVIDIA A100 GPUs. It's also reported that OpenAI now possesses at least 400,000 NVIDIA H100 and GH200 chips. Even Oracle CEO Larry Ellison has "begged" NVIDIA CEO Jensen Huang to reserve sufficient computing chips for Oracle and Tesla, highlighting the scarcity and importance of computing resources in this field. If this "brute-force computing" model continues, the industry landscape will remain difficult to change.
However, this seemingly solidified situation is being disrupted. In mid-December 2023, DeepSeek-V3, a large language model developed by the Chinese startup DeepSeek, garnered significant attention from the US and European tech communities. The model demonstrates significant advantages in technical performance, open-source model, and cost-effectiveness, receiving positive industry reviews.
Independent testing agency ArtificialAnalysis shows that DeepSeek-V3 surpasses open-source models like Meta's Llama 3.1-405B and Alibaba's Qwen 2.5-72B in text understanding, coding, mathematics, and subject knowledge. Its performance is comparable to top closed-source models such as OpenAI's GPT-4 and Anthropic's Claude 3.5 Sonnet. DeepSeek-V3's outstanding capabilities in Chinese processing, coding, and mathematical computation showcase immense potential in education and research.
DeepSeek-V3's performance not only surpasses many open-source competitors, such as Meta's Llama-3.1 and Alibaba's Qwen 2.5, but in some areas rivals top closed-source models. Generally, closed-source models tend to outperform open-source models due to their advantages in data and training resources. However, DeepSeek-V3's exceptional strength in code generation and mathematical computation breaks this conventional wisdom.
Even more remarkable is DeepSeek-V3's low-cost, high-efficiency training process. Andrej Karpathy, a founding member of OpenAI, points out that typically 16,000 to 100,000 GPUs are needed to train a model comparable to DeepSeek-V3. DeepSeek, however, used only 2048 GPUs, completing training in 57 days at a total cost of approximately $5.576 millionroughly one-tenth the cost of other mainstream models like GPT-4.
Let's illustrate DeepSeek-V3's efficiency advantage with more specific numbers. DeepSeek-V3 used 2048 NVIDIA H800 GPUs, taking two months to train a massive 671 billion parameter model at a cost of approximately $5.5 million. In contrast, Silicon Valley companies training a model of comparable capability would typically opt for higher-end NVIDIA GPUs rather than the relatively lower-priced H800s. More importantly, they would need at least 16,000 high-end GPUs to achieve a similar level, a stark contrast to DeepSeek's achievement with only 2000 H800 GPUs. In terms of computing power consumption, DeepSeek-V3's training cost is only one-eleventh that of a comparable Silicon Valley model. Meta's cost for training a model of equal capability would run into the hundreds of millions of dollars, making DeepSeek's cost-effectiveness incomparable.
Pangolin Think Tank expert Hu Yanping believes DeepSeek-V3's success demonstrates the feasibility of industry-specific large language model routes, although a gap remains compared to general-purpose LLMs. However, considering China's AI LLM development path emphasizes "industrialization," industry-specific LLMs have higher product-market fit, better aligning with China's needs for AI to empower various industries.
It's noteworthy that the release of DeepSeek-V3 also caused fluctuations in NVIDIA's stock price. Some Wall Street analysts believe that DeepSeek-V3 has shaken the market's confidence in the "brute-force computing" development model for AI LLMs, suggesting a potential shift in the direction of AI LLM development. DeepSeek-V3's success provides new insights into LLM training and introduces new variables into the global AI industry. Its low-cost, high-efficiency training method may become a key direction for future LLM development. This is not only significant for the development of the Chinese AI industry but also has far-reaching implications for the global AI industry landscape. The emergence of DeepSeek-V3 marks a new stage in AI LLM competition. The focus of future competition may no longer be solely on the scale of computing power, but on how to utilize computing resources more effectively, train models more efficiently, and ultimately achieve the industrialization of models. This will be a new era full of opportunities and challenges.
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])