🎊 12th Anniversary Celebration Continues — Limited-Time Revelry Round 3 is On!
Play twice a week to find the hidden anniversary cake and win prizes like iPhone 16 Pro, MacBook Air, exclusive merch, and more
✅ Just Log In to Start Playing: https://www.gate.io/activities/12th-anniversary?pid=APP&ch=s9ZCIeg6
Event Details: https://www.gate.io/announcements/article/44624
Sam Altman and Hinton debut in China! The most hard-core AI expert event in China was successfully concluded, and the domestic large-scale model "Enlightenment 3.0" was fully open-sourced
**Source:**Xinzhiyuan
Just now, the annual domestic "AI Spring Festival Gala" Zhiyuan Conference has successfully concluded!
At this artificial intelligence annual peak event, there are familiar star teams such as OpenAI, DeepMind, Anthropic, HuggingFace, Midjourney, Stability AI, etc., there are Meta, Google, Microsoft and other major manufacturers that have conquered the world, and there are Stanford, UC Berkeley , MIT and other top universities in the world.
Authors of important works such as GPT-4, PaLM-E, OPT, LLaMA, etc. all attended and explained the research results to us. This conference can be said to have both professional depth and creative inspiration, and every topic has been discussed to the extreme.
The climax of the conference was undoubtedly the speeches of Turing Award winner Yann LeCun, Geoffrey Hinton, and OpenAI founder Sam Altman.
Geoffrey Hinton: Super AI risk is urgent
In the just-concluded closing keynote speech of the forum, Hinton, the winner of the Turing Award and the father of deep learning, conceived a scenario worth thinking about for us.
Yes, in his opinion, that could happen soon.
As some time ago, Hinton resigned from Google and explained the reasons for his resignation in a nutshell. He has spoken out about regrets about his life's work and worries about the dangers of artificial intelligence. He has repeatedly stated publicly that the dangers of artificial intelligence to the world are more urgent than climate change.
Similarly, at the Zhiyuan Conference, Hinton talked about AI risks again.
What if a large neural network running on multiple digital computers could acquire knowledge directly from the world, in addition to imitating human language for human knowledge?
This idea is not far-fetched, if this neural network can perform unsupervised modeling of images or videos, and its copies can also manipulate the physical world.
If a superintelligence is allowed to formulate its own subgoals, one subgoal being to gain more power, the superintelligence will manipulate the humans who use it in order to achieve that goal.
Zhang Hongjiang and Sam Altman Peak Q&A: AGI may appear within ten years
This morning, Sam Altman also appeared via video link. This is the first time Sam Altman has given a public speech in China after ChatGPT exploded.
The reason why the current AI revolution is so impactful is not only the scale of its impact, but also the speed of progress. This brings both dividends and risks.
With the advent of increasingly powerful AI systems, strengthening international cooperation and building global trust is paramount.
Alignment is still an open issue. GPT-4 has completed the alignment work in the past 8 months, mainly including scalability and explainability.
In his speech, Altman repeatedly emphasized the necessity of global AI security alignment and supervision, and specifically quoted a sentence from the Tao Te Ching:
In his view, artificial intelligence is developing at an explosive speed, and super AI may appear in the next ten years.
Therefore, it is necessary to promote AGI safety, strengthen international cooperation, and align relevant research deployments.
Sam Altman believes that cooperation in the international scientific and technological community is the first step to take a constructive step at the moment. In particular, transparency and knowledge-sharing mechanisms for technological progress in AGI safety should be improved.
In addition, Altman mentioned that the current main research goal of OpenAI is focused on AI alignment research, that is, how to make AI a useful and safe assistant.
Ultimately, OpenAI aims to train AI systems to help with alignment research.
After the speech, Zhang Hongjiang, chairman of Zhiyuan Research Institute, and Sam Altman opened an air dialogue to discuss how to make AI safe alignment.
In addition, he also said that there will be no GPT-5 anytime soon.
After the meeting, Altman issued a message to express his gratitude for being invited to give a speech at the Zhiyuan Conference.
LeCun: still a fan of the world model
Another Turing Award winner, LeCun, who spoke on the first day, still continued to promote his own "world model" concept.
He explained that AI cannot reason and plan like humans and animals, in part because current machine learning systems have essentially constant computational steps between input and output.
How can a machine understand how the world works, predict the consequences of actions like humans, or break it down into multiple steps to plan complex tasks?
LeCun said that he has determined that the three major challenges of artificial intelligence in the next few years are to learn the representation of the world, predict the world model, and use self-supervised learning.
The key to building a human-level AI may be the ability to learn a "world model".
Among them, the "world model" consists of six independent modules, including: configurator module, perception module, world model, cost module, actor module, and short-term memory module.
When asked whether the AI system would pose an existential risk to humans, LeCun said that we do not have a super AI yet, so how can we make the super AI system safe?
The best matching "AI expert event"
The vigorous 2023 Zhiyuan Conference can be said to be the highest-level and most watched conference in the domestic AI field this year.
From the beginning of its establishment, the essential characteristics of Zhiyuan Conference are very clear: academic, professional, cutting-edge.
In a blink of an eye, this annual event for AI experts has come to its fifth year.
This time, the 2023 Zhiyuan Conference continues the tradition of each Zhiyuan Conference, and the sense of academic atmosphere is still overwhelming.
In 2022, two Turing Award winners Yann LeCun and Adi Shamir, the father of reinforcement learning Richard Sutton, academician of the United States Michael I. Jordan, Gödel Prize winner Cynthia Dwork and other heavyweights shared.
And by 2023, it will undoubtedly be the "most star-studded" session.
There are 4 Turing Award winners Yann LeCun, Geoffrey Hinton, Joseph Sifakis and Yao Qizhi, as well as OpenAI founder Sam Altman, Nobel Prize winner Arieh Warshel, Future Life Institute founder Max Tegmark, and 2022 Wu Wenjun Supreme Achievement Award winner Zheng Nanning Academicians and academician Zhang Bo of the Chinese Academy of Sciences participated.
"Enlightenment 3.0" is a large-scale model series.
Specifically, it includes the Aquila language large-scale model series, the Flag large-scale model evaluation system, the "Enlightenment · Vision" visual large-scale model series, and the multi-modal large-scale model series.
Language Large Model Series
Enlightenment·Aquila: fully open commercial license
The first to appear is the Aquila series of large models, which are the first open-source language large models with bilingual knowledge in Chinese and English to support domestic data compliance requirements, and have fully opened commercial licenses.
This open source includes the basic model of 7 billion parameters and 33 billion parameters, the AquilaChat dialogue model, and the AquilaCode "text-code" generation model.
Stronger performance
Technically, the Aquila basic model (7B, 33B) technically inherits the architectural design advantages of GPT-3, LLaMA, etc., replaces a batch of more efficient bottom-level operator implementations, redesigns and implements a Chinese-English bilingual tokenizer, The BMTrain parallel training method has been upgraded, and in the training process of Aquila, the training efficiency is nearly 8 times higher than that of Magtron+DeepSpeed ZeRO-2.
Specifically, the first is to benefit from a new technique to accelerate the training framework in parallel.
Last year, Zhiyuan open sourced the large model algorithm open source project FlagAI, which integrated a new parallel training method such as BMTrain. During the training process, its computation and communication as well as overlapping issues are further optimized.
Secondly, Zhiyuan took the lead in introducing operator optimization technology, and integrated it with parallel acceleration methods to further speed up performance.
Learning Chinese and English at the same time
Why is the release of Aquila so encouraging?
Because many large models "only learn English"-only based on a large amount of English corpus training, but Aquila has to learn both Chinese and English.
You may have experienced it yourself: When one learns knowledge, it is no problem if you keep using English all the time, but if you learn English and then learn Chinese, the difficulty will be overwhelming.
Therefore, compared with English-based models such as LLaMA and OPT, the training difficulty of Aquila, which needs to learn both Chinese and English knowledge, has increased many times.
In order to optimize Aquila for Chinese tasks, Zhiyuan put nearly 40% of the Chinese corpus in its training corpus. The reason is that Zhiyuan hopes that Aquila can not only generate Chinese, but also understand a lot of native knowledge of the Chinese world.
In addition, Zhiyuan has also redesigned and implemented the Chinese-English bilingual tokenizer (tokenizer), which is to better recognize and support Chinese word segmentation.
In the process of training and design, for Chinese tasks, the Zhiyuan team deliberately weighs the two dimensions of quality and efficiency to determine the size of the tokenizer.
The AquilaChat dialogue model (7B, 33B) is built on the base of the Aquila basic model to support smooth text dialogue and multi-language generation tasks.
In addition, by defining extensible special instruction specifications, AquilaChat can be used to call other models and tools, and it is easy to expand.
For example, the AltDiffusion multilingual text and image generation model open sourced by Zhiyuan is used to realize smooth text and image generation capabilities. Cooperating with Zhiyuan InstructFace multi-step controllable Vincent graph model, it can also easily realize multi-step controllable editing of face images.
AquilaCode-7B "text-code" generation model, based on the powerful basic model capabilities of Aquila-7B, achieves high performance with a small data set and a small amount of parameters. It is currently the best open source code model that supports Chinese and English bilingual performance. After high-quality filtering, training is performed using training code data with compliant open source licenses.
More compliant and cleaner Chinese corpus
Compared with foreign open source models, the most distinctive feature of Aquila is that it supports domestic data compliance requirements.
Foreign large-scale models may have certain Chinese capabilities, but almost all Chinese Internet data used by foreign open source large-scale models are extracted from Internet data sets such as Common Crawl.
However, if we analyze the Common Crawl corpus, we can find that there are less than 40,000 Chinese webpages available in its 1 million entries, and 83% of them are overseas websites, which is obviously uncontrollable in quality.
Therefore, Aquila did not use any Chinese corpus in Common Crawl, but used Zhiyuan's own Wudao dataset accumulated over the past three years. The Wudao Chinese data set comes from more than 10,000 mainland Chinese websites, so its Chinese data meets compliance requirements and is cleaner.
In general, this release is just a starting point. Zhiyuan's goal is to create a complete set of large model evolution and iteration pipelines, so that the large model will continue to grow with the addition of more data and more capabilities, and It will continue to be open source and open.
It is worth noting that Aquila is available on consumer graphics cards. For example, the 7B model can run on 16G or even smaller video memory.
Library (Flag) large model evaluation system
A safe, reliable, comprehensive and objective large-scale model evaluation system is also very important for the technological innovation and industrial implementation of large-scale models.
First of all, for the academic community, if you want to promote the innovation of large models, you must have a ruler to measure the capabilities and quality of large models.
Secondly, for the industry, the vast majority of companies will choose to directly use the existing large models instead of developing them from scratch. When selecting, an evaluation system is needed to help judge. After all, self-developed basic large models rely on huge computing power costs. To develop a model with 30 billion parameters, the funds required include computing power, data, etc., at least 20 million.
In addition, whether it is possible to build a comprehensive large-scale model evaluation system of "automated evaluation + manual subjective evaluation", and realize the automatic closed-loop from evaluation results to model capability analysis, and then to model capability improvement, has become an important aspect of basic large-scale model innovation. one of the barriers.
In order to solve this pain point, Zhiyuan Research Institute chose to give priority to launching the Libra (Flag) large-scale model evaluation system and open platform (flag.baai.ac.cn).
Specifically, the Flag large-scale model evaluation system innovatively constructs a three-dimensional evaluation framework of "ability-task-indicator", which can fine-grainedly describe the cognitive ability boundary of the basic model and visualize the evaluation results.
Currently, the Flag large-scale model evaluation system includes a total of 600+ evaluation dimensions, including 22 evaluation data sets and 84,433 questions, and more dimensional evaluation data sets are being gradually integrated.
In addition, the Flag large-scale model evaluation system will continue to explore the interdisciplinary research between large-scale language model evaluation and social disciplines such as psychology, education, and ethics, in order to evaluate large-scale language models more comprehensively and scientifically.
Visual large model series
In terms of computer vision, the Enlightenment 3.0 team has created the "Enlightenment Vision" series of large models with general scene perception and complex task processing capabilities.
Among them, it is the SOTA technology of this 6 bursts that builds the underlying foundation of "Enlightenment·Vision":
Multi-modal large model "Emu", pre-trained large model "EVA", visual multi-task model "Painter", general-purpose segmentation model of vision, graphic pre-training large model "EVA-CLIP" and video editing technology "vid2vid-zero ".
1. Emu: Completing everything in a multimodal sequence
After the training is completed, Emu can complete everything in the context of multi-modal sequences, perceive, reason and generate data of various modalities such as images, texts and videos, and complete multiple rounds of graphic-text dialogues and few-sample graphic-text understanding , video question and answer, text-to-image generation, image-to-image generation and other multi-modal tasks.
2. EVA: The strongest billion-level visual basic model
Paper address:
EVA combines the semantic learning model (CLIP) and the geometric structure learning method (MIM), and expands the standard ViT model to 1 billion parameters for training. In one fell swoop, it achieved the strongest performance at that time in a wide range of visual perception tasks such as ImageNet classification, COCO detection and segmentation, and Kinetics video classification.
3. EVA-CLIP: The most powerful open source CLIP model
Paper address:
EVA-CLIP, developed with the basic vision model EVA as the core, has been iterated to 5 billion parameters.
Compared with the previous OpenCLIP with an accuracy rate of 80.1%, the EVA-CLIP model has an accuracy rate of 82.0% in ImageNet1K zero-sample top1. In terms of ImageNet kNN accuracy, Meta's latest DINOv2 model is on par with the 1 billion parameter EVA-CLIP.
4. Painter: The first "contextual image learning" technology path
Paper address:
The core idea of the general visual model Painter modeling is "vision-centric". By using images as input and output, contextual visual information is obtained to complete different visual tasks.
5. Universal Segmentation Model of Horizons: All-in-One, Split Everything
To put it simply, users mark and recognize a class of objects on the screen, and they can identify and segment similar objects in batches, whether in the current screen or other screens or video environments.
6. vid2vid-zero: The industry's first zero-sample video editing technology
Paper link:
Demo site:
The zero-sample video editing technology "vid2vid-zero" uses the dynamic characteristics of the attention mechanism for the first time, combined with the existing image diffusion model, to create a model framework for video editing without additional video pre-training. Now, just upload a video, and then enter a string of text prompts, you can edit the video with specified attributes.
The enlightener of China's large-scale model research
Zhiyuan Research Institute, established in November 2018, is the pioneer of large-scale model research in China. After five years of development, it has become a benchmark for large-scale model research in China.
What makes it different from other institutions is that Zhiyuan Research Institute is a platform institution. At the beginning of its establishment, Zhiyuan Research Institute took the creation of an artificial intelligence innovation ecosystem as one of its basic missions and tasks.
How has Zhiyuan promoted the development of large-scale model research in China since its establishment?
Speaking of which, the main direction of OpenAI research established in 2015 is to explore the route to AGI, and it is not a large model.
After 2018, OpenAI began to focus on large models, and released GPT with 117 million parameters in June. In the same year, Google also released a large-scale pre-trained language model BERT with 300 million parameters.
Everyone has noticed that the entire industry trend and technology trend in 2018 is to make a bigger model.
As the computing power used by the model increases, Moore's Law becomes the so-called "model law", that is, the computing power used to train a large model doubles in 3-4 months.
As a result, in 2021, Zhiyuan successively released two large models of Enlightenment 1.0 and Enlightenment 2.0.
According to Huang Tiejun, at the Enlightenment 1.0 press conference in March 2021, Zhiyuan Research judged that artificial intelligence has changed from a "big model" to a new stage of "big model". Since then, the concept of "big model" has entered the public vision.
Every year at the Zhiyuan Conference, the three major technical routes to climb the peak of AGI will be recounted: large models, life intelligence and AI4Science. These three routes are not isolated, they interact and influence each other.
Linguistic data itself contains rich knowledge and intelligence, which is extracted through large-scale models, and neural networks are used to express the laws behind complex data.
This is a reasonable reason why one of the technical routes of the large model can lead to AGI.
This also explains why Zhiyuan initially focused on the large model. In March 2021, Enlightenment 1.0 was released, followed by Enlightenment 2.0 in June.
In addition, in addition to the large model, Zhiyuan is also constantly exploring the other two roads leading to AGI, "Life Intelligence" and "AI4Science".
In 2022, Zhiyuan released the most accurate simulation of Caenorhabditis elegans. This time, Zhiyuan opened the life simulation platform "eVolution-eVolution" used in the study of artificial nematodes to provide online services.
Tianyan is an ultra-large-scale fine neuron network simulation platform, with four notable features: the most efficient platform for fine neuron network simulation; support for ultra-large-scale neural network simulation; provide one-stop online modeling and simulation toolset; High-quality visual interaction supports real-time simulation and visual collaborative operation.
Based on the Tianyan platform, it realizes high-precision simulation of biological intelligence, explores the essence of intelligence, and promotes general artificial intelligence inspired by biology. Further, the Tianyan team has connected Tianyan to my country's new generation of exascale supercomputer - Tianhe new generation supercomputer.
Through the successful deployment and operation of "Tianyan-Tianhe", the model simulation of mouse brain V1 visual cortex fine network and other models can be realized, and the calculation energy consumption can be reduced by more than 10 times, and the calculation speed can be increased by more than 10 times, reaching the most extreme in the world The performance of fine neuron network simulation lays a solid foundation for the realization of fine simulation of the whole human brain.
Now, two years later, Zhiyuan released the Enlightenment 3.0 series of large models again.
In terms of positioning, since the release of Enlightenment 2.0, Zhiyuan, as a non-profit platform organization, not only makes models and models, but also gradually makes unique contributions to the construction of the core ecology of large models.
Among them, it includes data sorting behind the model, model testing, algorithm testing, open source and open organizations, and a comprehensive layout of computing power platforms.
Why did Zhiyuan make such a change?
Because Zhiyuan deeply understands that the large model itself is not the most important product form in the large model era, but a new era characterized by systematization and intellectual services.
At present, the large model will continue to evolve, and what remains unchanged is the technical iteration behind it, that is, the algorithm for training the model.
The latest model you see every day is just a solidified result. What matters is whether the algorithm for training the model is advanced, whether the cost is effectively reduced, and whether the ability behind it is explainable and controllable.
Therefore, as a platform organization, what Zhiyuan has to do is to bring together the algorithms of the industry's training models into an iterative whole.
This work is necessary. Zhiyuan not only works on the large-scale model algorithm itself, but also spends more time and energy on the development of the technical system of the large-scale model.
For example, Zhiyuan launched a large-scale cloud computing service platform "Jiuding Smart Computing Platform" to provide computing power, data, and algorithm support for large-scale model training.
Of course, it is not only Zhiyuan's own strength, but also industry-university-research institutes to collaborate and iterate in an open manner.
In March of this year, Zhiyuan released the FlagOpen Feizhi large-scale model technology open source system, which is an open source and open software system for large-scale models jointly built with a number of industry-university-research units.
You may ask, what is the biggest feature of this year's Zhiyuan Conference compared with previous ones?
The style is consistent, summed up in two words: professional and pure.
The Zhiyuan Conference was held without any realistic goals, and did not pay attention to the products and investors.
Here, industry leaders can put forward personal opinions and make judgments from a professional perspective, and of course include collisions and debates of top opinions, without having to consider many realistic factors.
"Godfather of AI" Geoffrey Hinton participated in the Zhiyuan Conference for the first time this year. Some time ago, he resigned from Google because he regretted his life's work. He published the latest views on artificial intelligence safety.
As always, Yann LeCun, an "optimist", will not worry about the risks of artificial intelligence like most people. In his opinion, it is unreasonable to brake before the car is built. At present, we still need to work hard to develop more advanced AI technology and algorithms.
At the same time, you will also see a fierce confrontation of views at the meeting. Max Tegmark on Controlling AI Risk. Although it cannot be said to be completely opposite to LeCun, there are also great differences.
This is the biggest highlight of the Zhiyuan Conference, and it is also a consistent style.
The uniqueness of this positioning has become more and more important in recent years.
The development of artificial intelligence has an increasing impact on the world and China, so everyone needs an occasion to express their views in a pure way, including ideological collisions and heated debates.
The significance of this is that only the more professional, purer, more neutral, and more open conferences are, the more conducive it is for everyone to better grasp such an era of rapid development. effect.
In foreign countries, the Zhiyuan Conference also has an excellent reputation. International organizations regard the Zhiyuan Conference as a window for cooperation with China in artificial intelligence research.
The origin of the name Zhiyuan is also the source of intelligence. Therefore, holding the Zhiyuan Conference has become a landmark event to promote the ecological development of artificial intelligence.
The strong lineup of guests, the richness of topic setting, and the depth of content discussions have created a unique Zhiyuan Conference.
This top event exclusive to AI experts has become a bright business card in the field of AI in China.
References: