In the GPT era, searching for the computing power fulcrum of the iFlytek Spark model

Cloud Tech 2023-07-13 12:06:29 Source: Network

The emergence of large models is still ongoing.At the Zhongguancun Forum held at the end of May, experts revealed that 79 large-scale models with parameters above 1 billion in China had been released, and another batch of large-scale models were released in bulk at the recently concluded World Artificial Intelligence Conference

The emergence of large models is still ongoing.

At the Zhongguancun Forum held at the end of May, experts revealed that 79 large-scale models with parameters above 1 billion in China had been released, and another batch of large-scale models were released in bulk at the recently concluded World Artificial Intelligence Conference.

The popularity of large models remains high, and the World Artificial Intelligence Conference even set up an exhibition area with the theme of "Towards Universal Artificial Intelligence", showcasing a total of more than 30 large models both domestically and internationally.

What is behind the surface of the emergence of large models that we see?

At the Shengteng AI Industry Summit Forum, Hu Guoping, senior vice president of iFLYTEK and director of the National Key Laboratory of Cognitive Intelligence, IFlytek, demonstrated the capabilities of the Spark Model on the spot. The Spark Model was extended from the Spark Model, and the supporting role of the computing base became the focus of the competition for the big model.

Later on, first on, the Starfire model squeezed into the first camp

I have to say that the sense of smell of IFlytek is very keen.

Just 15 days after OpenAI released ChatGPT on November 30 last year (December 15), IFlytek launched the special research on the "1+N" cognitive intelligence model. More than five months later (May 6), the Spark cognitive model was officially released. One month later, on June 9, the Spark cognitive model V1.5 was released.

According to the plan of IFlytek, there will be two major upgrades to the Spark model this year, namely:

On August 15th, the code capability was upgraded and the multimodal interaction capability was improved;

On October 24th, a universal model benchmarking ChatGPT was achieved, surpassing the current version of ChatGPT in Chinese and achieving a considerable level of proficiency in English, leading the industry in fields such as education and healthcare.

One noteworthy point is that unlike other big models, the Spark Cognitive Big Model adopts a "1+N" architecture, where "1" refers to the general cognitive intelligence big model and "N" refers to the landing of the big model in the vertical field.

According to Hu Guoping's sharing, the Spark Model has been implemented in fields such as education, office, automotive, healthcare, and industry, achieving innovative applications from 0 to 1 in multiple industry scenarios.

Let's take a look at Hu Guoping's live demonstration of the actual performance of the Spark Model, without practicing fake tricks.

The first test was the text generation ability of the Spark Big Model. Hu Guoping proposed a task of "imagining the world after the implementation of universal artificial intelligence through poetry", and the big model immediately gave the answer - "On the arrival of universal artificial intelligence, the world has changed like the wind, wisdom is boundless and within reach, human life has been revitalized, and autonomous driving gallops through the world

In terms of language understanding, the Spark Model can not only streamline contextual relationships, but also provide clear dialectical understanding and situational applications for vocabulary such as "prefer death to surrender" and "be able to bend and stretch".

In terms of knowledge Q&A dimensions, the Spark Model can provide more targeted answers based on search results, utilizing the language understanding and comprehensive expression abilities of the model.

Logical reasoning is a key task that tests the intelligence level of large models. After two versions of iteration, iFlytek Spark can now complete complex reasoning under the combined constraints of "farmers crossing the river with wolves, sheep, and vegetables" very well.

According to Hu Guoping's introduction, the mathematical and coding abilities of the Spark Model have also made significant progress since its release. Among them, mathematical abilities can accurately provide answers to geometric and algebraic problems involved in high school according to steps, and code abilities have also made new breakthroughs, especially Python's code generation ability has reached a relatively high level.

The multimodal ability demonstrated at the end, according to Hu Guoping's task instructions, the Spark Model quickly generated a prose text, and at the same time, a Virtual humans anchor with the image of a girl was used to recite the prose.

It is obvious that the performance of the Spark Big Model is very outstanding. After scientific and systematic evaluation, the "iFlytek Spark Cognitive Big Model" is at the leading level among the existing measurable systems in China.

From project initiation, release, and iteration, the Spark Model has a very short time left for R&D training at each node. However, in terms of its demonstrated ability and effectiveness, it can remain the top tier of China's major models. What secrets are hidden behind it?

Surprisingly stunning, you can see the quality of the Ascending Computational Power Base

In addition to IFlytek's deep technical reserves and accumulation in the field of cognitive intelligence over the years, the computing base supported by Shengteng AI is particularly critical.

The first requirement for large model training is high computational power.

Industry experts have conducted calculations and completed a large model with a parameter level of 100 billion yuan. For example, GPT-3 requires model training with 314ZFLOPS computing power. When a single card only has 312TFLOPS computing power, it takes 32 years to train a model with one card.

Therefore, introducing distributed training solutions to accelerate model training by establishing AI chip clusters has become the mainstream in the current industry.

However, as the chip cluster becomes larger and larger, due to the parallel partitioning of large models into the cluster, a large amount of multi card communication and node communication will occur between model slices, which also puts higher requirements on the communication ability of the cluster.

From this, it can be seen that large model training not only tests the size of computing power, but also tests the ability of computing power cluster engineering and systematization.

Taking the Starfire model as an example, the training time of the entire model is very short, and its iteration speed is fast, which means that in addition to computational power, the stability and scalability of the model training also need to be good.

Let's take a look at how the Ascension AI cluster is achieved.

Firstly, after the upgrade of the entire system, all elements such as computing, storage, network, and energy are integrated together, which is equivalent to turning the AI data center into an AI supercomputer, achieving a doubling of energy efficiency.

Secondly, based on the architecture design of the backplane bus, it achieves full node blind insertion and precise liquid cooling heat dissipation, with a higher computing power density and a PUE below 1.15, making the computing power center greener and enabling more flexible expansion and deployment.

Finally, through multi-level reliable design of nodes, cabinets, clusters, and job levels, system level faults can be diagnosed, predicted, measured, and recoverable, and can maintain a stable training cycle of over 30 days, achieving high availability.

In fact, as early as 2019, Shengteng AI had already begun exploring a thousand card cluster, with a scale of only 4000 cards at that time. It was put into commercial use in 2020; At the recently concluded Shengteng Artificial Intelligence Industry Summit Forum, Huawei announced a comprehensive upgrade of the Shengteng AI cluster, which has expanded to 16000 cards. In other words, a large model with 175 billion parameters and 100B data can complete a training session in about half a day.

In fact, supporting the research and development and training of the Spark Model is only a microcosm of the capabilities of Shengteng AI. At a higher level, Shengteng AI has also widely participated in the construction of more than 20 artificial intelligence computing centers across the country, including Wuhan, Beijing, Xi'an, Chengdu, Dalian, Shenyang, etc. Seven cities have been recognized by the country and have become one of the first batch of national new generation artificial intelligence public computing open innovation platforms by the Ministry of Science and Technology.

At the same time, Shengteng AI also supports the development of nearly half of China's original models. According to the "China Artificial Intelligence Large Model Map Research Report" released in May this year, more than 30 large models with a parameter scale of over 1 billion in China are based on Shengteng's native open source and adaptation, covering multiple fields such as NLP, multimodality, cloud, and voice.

With so many projects, Shengteng AI has accumulated a lot of experience. Therefore, in promoting the implementation of large model applications, Shengteng AI is not only a computing power provider, but also a shaper of the large model development process based on efficiency.

The development model of large models started with the traditional API-based approach, while Ascension AI moved towards model based development by providing a series of large model development kits. In this development model, the entire process of script development can be achieved with just a few dozen lines of code, reducing the threshold for large model development.

Obviously, facing many difficulties and challenges in the development and training of large models, Shengteng AI has risen to the challenge and chosen the front hard steel. For Shengteng AI itself, it is an early occupying position in the competition for computing power in the era of large models; Overall, in the large model industry, the domestic large model architecture is based on independent innovation of software and hardware, which is a true manifestation of the country's technological strength.

On the path of innovation, Chinese AI needs more peers

The era of big models has just begun, and there are still many uncertainties in the future. The only certainty is that there will be a continuous demand for computing power.

Hu Guoping predicted three trends in the development of large models.

The first is that in the future, more new large models will emerge. After continuous iteration, the existing large models will have a greater growth in data scale, coupled with an increase in the number of users on the application side, which will bring greater computational power demand.

The second is that as the capabilities of large models improve, they can generate data and intelligent inputs and outputs with more sensors and actuators. The boundaries of large models will further diffuse, resulting in greater computational power consumption.

The third is that in the future, everyone will have their own special big model or assistant. Around their personal learning and life, Personal Care Assistant are evolving and upgrading synchronously every moment, which poses a challenge to extremely low-power chips and system solutions.

It is not difficult to see that these three trends are closely related to computational power. In Hu Guoping's view, the principle of large models is similar to that of the brain, which is composed of over 100 billion neurons that receive input stimuli and generate intelligent outputs, with similar mechanisms for intelligent stimulation and operation.

This also means that 'what the brain can do, big models can also achieve', and big models have infinite potential, and the exploration of computing power bases is endless.

Of course, to do a good job with a large model, just having computational power is not enough.

Zhang Bo, an academician of the CAS Member, a professor of the Department of Computer Science of Tsinghua University, and an honorary dean of the Institute of Artificial Intelligence of Tsinghua University, believes that the success of ChatGPT is not only due to the three elements of data, computing power and algorithms, but also to emphasize four elements, namely knowledge, data, algorithms and computing power.

That is to say, we need to obtain data from the text, and then obtain knowledge from the data. This transformation has led to the current ChatGPT, which is based on the breakthroughs in three technologies: "text semantic representation based on Word embedding", "attention based converter", "self Supervised learning based on prediction of the next word".

From this perspective, the three elements of data, algorithms, and computing power may seem independent, but they are closely related in the big model, highlighting the importance of industrial ecological construction.

The AI industry ecosystem of Shengteng has developed rapidly. So far, it has developed more than 30 hardware partners, more than 1200 ISVs, and jointly launched over 2500 industry AI solutions. This ecosystem can be directly transported to the large model industry.

In terms of talent cultivation, over 300 universities and colleges have collaborated with Shengteng AI, cultivating over 100000 professional AI talents every year. The number of Shengteng AI developers is also growing rapidly, surpassing 1.8 million this year.

Because of this foundation, Shengteng AI announced at this conference forum that it would jointly release the integrated solution of training and promotion of big models with four ecological partners, namely IFlytek, Zhipu AI, CloudWalk Technology and Facewall Intelligence, to accelerate the landing speed of big models and make them play a role in more subdivided industries, such as smart cities, smart finance, smart coal mines, and smart manufacturing.

Undoubtedly, the big model will definitely usher in its own era. If the era has arrived, then its decisive period is definitely not the first year of just starting. Like other disruptive new industrial technologies, the development of the big model is destined to be a long run of time and endurance.

Of course, in the process of bullet flying, before the decisive moment of the big model era comes, we need more IFlytek, and also urgently need the Shengteng AI that can provide powerful computing power.

Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])