Using model evolution models, Zhiyuan releases open source packages; Big Models Can't Bring AGI, LeCun Proposes Three Major Challenges | Frontline
Wen | Zhou XinyuEditor | Deng YongyiIs the next stop for the big model General Artificial Intelligence (AGI)?At the Zhiyuan Conference that opened on June 9th, top talents in the AI field from China and the United States discussed the future of big models and AGI.Due to its non-profit and research oriented nature, Zhiyuan is regarded by the industry as the "early OpenAI of China" and also the Huangpu Military Academy for domestic AI talents
Wen | Zhou Xinyu
Editor | Deng Yongyi
Is the next stop for the big model General Artificial Intelligence (AGI)?
At the Zhiyuan Conference that opened on June 9th, top talents in the AI field from China and the United States discussed the future of big models and AGI.
Due to its non-profit and research oriented nature, Zhiyuan is regarded by the industry as the "early OpenAI of China" and also the Huangpu Military Academy for domestic AI talents. In Zhiyuan's big model research project "Wudao", the first tier of AI entrepreneurship teams from universities such as Zhipu Huazhang Tang Jie, Circular Intelligence Yang Zhilin, and Yuancheng Xiang Lu Zhiwu were gathered.
At this conference, the luxury level of the participating lineup also seems to mean that it is time to establish a global industry consensus regarding the future of the big model. The attendees included Turing Award winners Geoffrey Hinton, Yann LeCun, Joseph Sifakis, and Yao Jizhi, as well as Midjournal founder David Holz. OpenAI founder Sam Altman will participate in a sub forum on June 10th with the theme of "security".
Transparent and open ecology "is one of the themes. This theme aims to explore the "ceiling" of current large model capabilities by establishing an open source ecosystem and model capability evaluation system.
Since the launch of "Wudao 2.0", the "world's largest model" with a parameter scale of 1.75 trillion yuan, in 2021, Zhiyuan released "Wudao 3.0" - a model service platform built around three open source model systems at the conference. The model services provided by "Wudao 3.0" include underlying data processing and aggregation, model capabilities, and algorithm evaluation.
Another theme revolves around how people can truly move from the big model era to AGI.
They (pre trained models) do not have knowledge about basic displays. "In the connection, Yann LeCun, one of the" Big Three of Deep Learning "and Meta's chief AI scientist, first sentenced the current hot big model to the" death penalty "of moving towards AGI. Afterwards, he gave the idea of a solution: 'Observe the world like a baby'.
Basic model+evaluation tool+iterative solution, open source "whole family bucket" release
Unlike the 1.0 and 2.0 versions released in 2021 with the goal of "pure refining of big models", in the words of Huang Tiejun, the director of Zhiyuan Research Institute, "Wudao 3.0" is an ecosystem centered around big models, which includes low-level data processing and aggregation, model capability and algorithm evaluation, open source and openness, forming an efficient set of big model technology and algorithm system.
Generally speaking, "Wudao 3.0" refers to using large models to more scientifically and controllably "refine models".
The system for training this model includes two open source big model bases: the language big model series "Wudao Tianying", the visual big model series "Wudao Horizon", an open source big model evaluation system and open platform FlagEval (Libra), and a big model technology open source system FlagOpen (Feizhi).
The language model series "Wudao Tianying" includes the basic model Aquila (7B and 33B versions), dialogue model AquilaChat, and text code generation model AquilaCode.
Benchmarking ChatGPT, AquilaChat not only provides dialogue functionality, but also compensates for the limitations of the single modal dialogue model by defining extensible instruction specifications, calling APIs of other models, and third-party tools.
AquilaChat's written dialogue ability. Source: Zhiyuan
For example, relying solely on AquilaChat cannot achieve the functionality of cultural and biographical diagrams, but by calling the open-source cultural and biographical diagram model AltDiffusion from Zhiyuan, it can make up for the shortcomings of the text model being "biased". If the image editor InstructFace is further called, users can also edit the image.
AquilaChat calls AltDiffusion to generate images. Source: Zhiyuan
AquilaChat calls the image editor InstructFace to adjust the portrait. Source: Zhiyuan
The AquilaCode-7B code model launched this time is currently able to achieve performance similar to OpenAI's CodeX-12B with fewer training data and parameters, and has good inclusiveness in chip architecture adaptation.
AquilaCode implements the coding of clock programs. Source: Zhiyuan
At the same time, "Wudao Vision" also launched a zero sample video editing method called vid2vid zero based on the basic model. The so-called "no sample" refers to the use of attention mechanism dynamic operations combined with image diffusion models, replacing the original scheme of training models with a large amount of video data.
For example, when a runner appears in the screen, the algorithm can automatically distinguish between the person in motion and the scenery behind them. By entering prompt, the person and scenery can be edited separately.
Vid2vid zero segmentation of video image elements. Image source: vid2vid zero paper
2. FlagEval (Libra)
In addition to continuously improving the reasoning and deductive ability of the model for multimodal content, improving the "interpretability" of the model is equally important - it can help us understand where the "intelligence" of large models comes from, just like understanding the brain, and provide fundamental solutions for improving the performance and security of the model.
At present, FlagEval has built a three-dimensional evaluation framework of "capabilities tasks indicators", forming a comprehensive evaluation of over 600 dimensions, including 30+abilities, 5 tasks, and 4 categories of indicators.
FlagEval constructs a three-dimensional evaluation framework for "capabilities tasks metrics". Source: Zhiyuan
In combination with "Wudao Tianying", FlagEval has currently implemented the evaluation of two types of large models: language and text, and has launched a large language model evaluation system, a multi language text and image large model evaluation tool mCLIP Eval, and a text and image generation evaluation tool ImageEval.
At the same time, FlagEval is not a static model scoring tool, but rather provides targeted training suggestions for the model through automated evaluation and adaptive evaluation mechanisms.
3. FlagOpen
Generally speaking, FlagOpen is a "repository" of open source data, algorithms, models, tools, and evaluation systems. The framework FlagBoot, AIGC application FlagStudio, and other system components constitute "storage rooms" that undertake different functions.
For example, the open source project FlagAI for large model algorithms integrates mainstream global large model algorithms and technical solutions, including language model OPT and T5, visual model ViT and SwinTransformer, and multimodal model CLIP.
FlagOpen builds an open source repository that covers data, algorithms, models, tools, and evaluation systems. Source: Zhiyuan
Big model, brain like intelligence, embodied intelligence, and three major interconnections AGI
The emergence of intelligent language models is certainly exciting, but the consensus reached by scholars at the Zhiyuan Conference is that pre trained models that are as strong as GPT are difficult to achieve true AGI.
The reason lies in the self supervised training method. The emergence of self supervision has replaced supervised learning, which requires a large amount of annotated data, and improved the efficiency of deep learning. But the problem of self supervision is also obvious: the machine can only predict the missing parts based on the input data, but does not understand the actual relationship between the front and back.
Yann LeCun pointed out, "If you train these models on data with one trillion or two trillion tokens, their performance is amazing. But in the end, they will make foolish mistakes.They may make factual errors, logical errors, inconsistencies, and have limited reasoning ability, resulting in harmful content.
This also leads to the current situation where AI is difficult to overcome:Large models do not have basic knowledge of reality.
AGI..
By utilizing massive high-quality data, complex AI systems have the initial ability to emerge intelligently;
Forming basic neural network structures, signal processing mechanisms, etc., to enable machines to achieve abilities similar to those of biological or human brains;
.
Yann LeCun5.10.
Yann Lecun.
.Yann LeCun2022World Model.
.Autonomous Intelligence.
However, before the arrival of the true AGI era, he believed that AI would first need to face three main challenges in the coming years:
.
Welcome to communicate
Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])