Deepseek

Deepseek

DeepSeek usually refers to Hangzhou DeepSeek Artificial Intelligence Basic Technology Research Co., Ltd., an innovative technology company that focuses on developing advanced large language models (LLMs) and related technologies. The following is a detailed introduction about it:

### Company Background

Founded on July 17, 2023, it was founded by the well-known quantitative asset management giant Huanfang Quantitative. The registered address is Room 1201, Building 1, West Huijin International Building, No. 169, Huancheng North Road, Gongshu District, Hangzhou City, Zhejiang Province, and the legal representative is Pei Tian.

### Technology R&D and Model Release

- January 5, 2024: Release DeepSeekLLM, which contains 67 billion parameters and is trained on a data set containing 2 trillion tokens. The data set covers Chinese and English. DeepSeekLLM7B/67Bbase and DeepSeekLLM7B/67Bchat are open source, surpassing Llama270Bbase in reasoning, encoding, mathematics and Chinese understanding, and surpassing GPT-3.5 in Chinese performance.

- January 25, 2024: Release DeepSeek-Coder, which consists of a series of code language models, each trained from scratch on 2 trillion tokens, with a dataset containing 87% code and 13% natural language in Chinese and English. It has achieved the most advanced performance of open source code models in multiple programming languages and various benchmarks.

- February 5, 2024: Release DeepSeekMath, based on DeepSeek-Coder-V1.57B, continuing to pre-train on math-related tokens extracted from CommonCrawl as well as natural language and code data, with a training scale of 500 billion tokens. It achieved an excellent score of 51.7% in the competition-level Math benchmark, close to the performance level of Gemini-Ultra and GPT-4.

- March 11, 2024: Released DeepSeek-VL, an open source vision-language (VL) model that uses a hybrid visual encoder and can efficiently process high-resolution images (1024x1024) within a fixed token budget. The DeepSeek-VL family achieves state-of-the-art or competitive performance on a wide range of vision-language benchmarks at the same model size.

- May 7, 2024: Released the second generation open source mixture of experts (MoE) model DeepSeek-V2. Contains 236 billion total parameters and is pre-trained on a diverse and high-quality corpus of 8.1 trillion tokens. It performs well in the Chinese comprehensive ability assessment and has extremely low inference cost.

- June 17, 2024: Released DeepSeek-Coder-V2, an open source mixture of experts (MoE) code language model that achieves comparable performance to GPT4-Turbo in code-specific tasks. The number of supported programming languages has been expanded from 86 to 338, and the context length has been expanded from 16k to 128k.

- December 13, 2024: Release of DeepSeek-VL2, an expert hybrid visual language model for advanced multimodal understanding, which has demonstrated excellent capabilities in a variety of tasks, including visual question answering, optical character recognition, document/table/chart understanding, and visual positioning.

- December 26, 2024: The first version of the new series of models DeepSeek-V3 was launched and open sourced simultaneously. The level of knowledge tasks has been significantly improved compared to the previous generation DeepSeek-V2.5, and it has significantly surpassed all other open and closed source models in the American Mathematics Competition (AIME2024, Math) and the National High School Mathematics League (CNMO2024). The speed of generating words has been increased by 3 times compared to the V2.5 model.

### Impact and Evaluation

- International Impact: On February 5, 2025, Bloomberg News in the United States published an article stating that even if the United States can ban DeepSeek's mobile and web applications, its large models, ideas and codes have been integrated into the artificial intelligence community around the world and are widely used by programmers including those in the United States.

- Industry Evaluation: CITIC Securities Research Report pointed out that the official release of DeepSeek-V3 has attracted widespread attention in the AI industry. While ensuring the model capability, the training efficiency and reasoning speed have been greatly improved, which means that the application of AI large models will gradually become universal and help AI applications to be widely implemented; at the same time, the training efficiency has been greatly improved, which will also help the demand for reasoning computing power to increase.

To view or add a comment, sign in

More articles by Ellia Du

  • Should the solar controller choose air cooling?

    When designing the solar controller, LDSOLAR not only considers some common problems such as performance and conversion…

  • The following issues with battery balancer in the market

    We investigate the following issues with battery balancer in the market: Problem 1: Energy consuming balance There are…

  • Have you ever faced such the battery issues?

    When you use the batteries, in order to improve the working voltage of batteries, we usually need to connect some of…

  • How to choose a controller

    First, limit your choices to controllers that work with your battery bank voltage, which will usually be 12, 24, or 48…

  • MPPT VS NON-MPPT

    MPPT has in many ways revolutionized the PV industry. Higher-voltage PV arrays can now be used, reducing wire cost…

Insights from the community

Others also viewed

Explore topics