The mythical GPT cannot build your dream car

After the explosion of ChatGPT, the AI model has become a hot spot pursued by many technology companies. From chat conversations, to image generation, to desktop office, it seems that AI has the supernatural power to subvert everything overnight.

The craze spread to the automotive industry, and practitioners began to think: Is it feasible to let GPT build cars?

Some car companies announced that they would apply large-scale model technology, while others said they would access third-party large models, and some car companies rushed to release the automatic driving system with the word GPT.

Some practitioners told Shentu that smart cockpit and automatic driving may be the first application scenarios of large models. Among them, autonomous driving is the most anticipated.

Autonomous driving is an extremely difficult track. In addition to technology giants such as Google and Baidu, a large number of talented entrepreneurs have devoted themselves to it and burned billions of dollars, but so far they have not achieved satisfactory results.

AI large model enters automatic driving, will it be different this time?

How much is the relationship between GPT and cars?

GPT has no direct relationship with cars on the surface, but in fact it has a deep relationship. The story has to start six years ago.

In June 2017, Musk, the boss of Tesla, poached a Slovakian researcher from OpenAI. The man was Andrej Karpathy, who later became Tesla's director of AI.

At that time, Musk showed great interest in artificial intelligence, and he was also one of the founders of OpenAI. Shortly after recruiting Andrej Karpathy, Musk left the OpenAI board of directors. He believed that both Tesla and OpenAI were researching AI, and there might be conflicts of interest in the future.

Later, Andrej Karpathy rewrote the autopilot algorithm in Tesla and developed BEV pure visual perception technology, which brought Tesla autopilot into a new stage. And his former owner, OpenAI, bet all chips on general artificial intelligence, and finally developed GPT.

From a product perspective, OpenAI's GPT and Tesla's BEV are completely different species. But from the perspective of the underlying technology, they all rely on artificial intelligence technology, especially the application of the Google Transformer model.

Transformer is a deep learning neural network architecture proposed by 8 Google AI scientists in 2017. This is an extremely important invention in the artificial intelligence industry. The "T" in today's popular ChatGPT refers to the Transformer model.

Different from the traditional neural network RNN and CNN, Transformer uses a self-attention mechanism to mine the connection and correlation of different elements in the sequence, and has a good ability to process time series data. This allows it to show outstanding performance in tasks such as machine translation, text summarization, and question answering systems.

Therefore, Transformer was first used in the field of NLP (advanced natural language processing) to understand human text and language.

Pre-training on the Transformer model, after continuous fine-tuning and iteration, OpenAI has successively launched large language training models such as GPT-1, GPT-2, GPT-3, and GPT-4. ChatGPT is a dialogue robot developed by OpenAI after fine-tuning the GPT-3 model. Because it can interact in a conversational way, ordinary people are easy to use, and it is more "smart" than chatbots in the past, so it shines.

Fundamentally, ChatGPT's GPT model, Google's LaMDA model, and Baidu's Wenxin model all share the same origin.

Using the Transformer model for natural language gave birth to chat applications such as ChatGPT; using it in computer vision has also achieved amazing results. The pioneer in this regard is Tesla.

Andrej Karpathy was responsible for leading the computer vision team for autonomous driving during his tenure as Tesla's AI director. By combining the Transformer model, Tesla successfully developed BEV technology.

The full name of BEV is Bird's Eye View, which is a bird's eye view. It can convert the 2D images captured by the camera into 3D images by splicing, and uniformly convert them to the overlooking angle for processing, forming a "God's perspective". The reason for this is that driving is carried out in a three-dimensional space, and what people see is a three-dimensional world, not a 2D image.

This brand new perception solution will be demonstrated by Andrej Karpathy on Tesla AI DAY in August 2021. For this reason, Tesla did not hesitate to rewrite the autopilot algorithm and reconstructed the infrastructure for training deep neural networks.

This is the first time that large model technology has been applied to the autonomous driving industry.

Looking back today, although GPT is currently mainly used in the field of natural language processing, we cannot let GPT drive a car, but the AI large model technology behind it, especially the Transformer architecture, has actually been driving autonomously. field applied.

From natural language processing to computer vision, the two fields have unified the modeling structure based on the Transformer architecture, making joint modeling easier.

And as the understanding of AI deepens, car companies are becoming more and more like artificial intelligence companies. In addition to Tesla, Li Auto announced its company vision earlier this year, claiming to become an artificial intelligence company by 2030. It will launch the urban NOA navigation assisted driving system this year, and the technical support is the BEV perception and Transformer model.

There seems to be no difference between letting AI talk to people and letting AI drive a car, except that the landing scenarios of the two are different. Human beings are always full of imagination when it comes to applying the underlying technology to specific products.

GPT teaches those things about automatic driving

Since the beginning of this year, the powerful capabilities demonstrated by GPT have shocked the outside world. General artificial intelligence is no longer a castle in the air. People in the autonomous driving industry began to think that maybe the application of generative AI on language models can be migrated to autonomous driving.

Essentially, a language model is a mathematical model of human language. The computer still does not understand natural language, but it turns language problems into mathematical problems through mathematical modeling. Natural language is understood indirectly by predicting the probability of the next word appearing through the history of a given text.

Switching to the driving scene, given the current traffic environment, a navigation map, and a driver’s driving behavior history, can the large model predict the next driving action?

Yu Kai, the founder of Horizon, said at the Forum of 100 Electric Vehicles held in April this year that ChatGPT inspired him a lot, "We will continue to use big data, bigger data, bigger models, and unsupervised learning. Humans try to drive, just like you learn from a large amount of unsupervised and unlabeled natural text." He argues that the sequences of each driver's driving controls are like our natural language texts. Next, he wants to build a large language model that returns to autonomous driving.

In theory, this idea is feasible. AI already has the ability to learn. According to the adaptive language model, the machine will continue to iteratively optimize according to the user's feedback, learn the user's habits, and then improve the model. The current ChatGPT uses this technology. Then, it is not difficult for the machine to learn the driver's driving habits.

Tesla's shadow mode is to feed the driving data of real drivers to machine learning. The purpose of training the algorithm is achieved by comparing the behavior of human drivers.

After GPT set off a new round of AI boom, a cognitive impact on the industry is that by increasing the parameter scale of the model, the amount of data increases exponentially, which is the so-called large model. After reaching a certain critical point, Models can suddenly be smart.

In the past, the data required by the model during the training phase was manually labeled. Taking autonomous driving as an example, the data labeler uses a large number of pictures to label and tell the machine what is a cat, what is a dog, and how many types of cats and dogs there are. The annotator is like the teacher of the machine, teaching it to understand the world over and over again.

The problem is, what the teacher didn't teach, the machine still can't. A typical example is that Tesla has had autopilot accidents many times, and the vehicle hit a large truck that was overturned because the machine could not recognize it.

He Yuhua, the founding partner of Hegao Capital, gave such an example to Shentu: Guangzhou has frequent summer rainy days, and in some scenes with dim lights, there will be a large number of flying insects in the air. When a car drives by, the lights hit, and there may be thousands of flying insects hitting the front of the car. In this case, the car's autonomous driving perception system may mistake it for a wall.

The automatic driving system cannot exhaust all corner cases (extreme scenarios), which is a major difficulty in its development.

What ChatGPT grabs is the unmarked data of the whole network. In self-supervised learning, the data itself is used as a supervisory signal, rather than relying on human-labeled labels. One day people discovered that in the process of digesting these data, the large model suddenly has the ability to infer other cases from one instance.

So, if the large model of autonomous driving can also learn human driving behavior unsupervised, without the need for a "teacher" to teach it, does it mean that the system has suddenly changed into an "old driver"?

GPT "driving", is not reliable

Dreams are beautiful, and the road to realizing them is always very skinny.

For an AI model similar to ChatGPT to exert its power in the field of autonomous driving, at least the following problems need to be solved at present.

The first is the data source.

ChatGPT's data sources are very rich, including Wikipedia, books, news articles, scientific journals, etc., which is equivalent to the public data of the entire network as its nourishment.

Autonomous driving is different. The driver's driving data and vehicle driving data are not disclosed, and many of them involve privacy. Automobile manufacturers and self-driving companies operate independently, and the data is closed and not circulated, which makes it difficult to obtain data. Without data, autonomous driving is water without a source.

He Zhiqiang, president of Lenovo Venture Capital, told Shentu that the core of autonomous driving is to have data, and data is very important for training models. OEMs like BYD have data, but their algorithms still need to be refined. New car manufacturers such as "Wei Xiaoli" are good at algorithms, but their car sales are not enough. Companies with both data and algorithms can make full use of large models.

The second is that the computing deployment method of the system is limited.

Yu Kai believes that OpenAI and ChatGPT are computing in the cloud, which has sufficient energy supply, power supply, and a very good system. However, if the car relies on the battery and the heat dissipation of the car, then this The challenge is great, which means that automatic driving cannot use such a large model and such a large calculation.

The consumption of computing power by large models has caused cloud computing manufacturers to become the first batch of players to enjoy dividends in this wave of AI boom. Big companies' development of cloud computing also opens the way for large models. But on the car end, this will be a contradiction.

A bigger problem is that the reliability of large models has not been verified.

People who have used ChatGPT know that ChatGPT sometimes talks nonsense, sometimes right and sometimes wrong. This is known in the industry as a tendency toward hallucination, the tendency to produce unreal content that has no provenance at all. Big models make things up without caring about its truthfulness and accuracy.

Chatting can be nonsense, but automatic driving cannot. The result of any wrong output may be fatal.

"ChatGPT has made great progress, but automatic driving has not come yet, because automatic driving, especially unmanned driving, may have a zero fault tolerance rate, which is a matter of human life." Yu Kai said.

Long Zhiyong, who once served as the COO of an AI start-up company in Silicon Valley, believes that uncontrollable, unpredictable and unreliable are the biggest threats to the commercialization of large models. A typical manifestation is that the large model has a tendency to hallucinate.

Now, it is not realistic for the automatic driving system to learn to choose and distinguish, and to output the optimal solution stably.

An insider of an artificial intelligence company told Shentu: "There are indeed many breakthroughs in visual perception at the algorithm level. But the scene of the car is too demanding. I personally don't think there will be a big breakthrough in the short term. You can pay attention to the special Sla's movements."

However, recently there has been a trend in the technology circle that companies, large and small, want to take a look at the hot spots of GPT. Some car manufacturers have announced that they are about to apply GPT-like technology, and a bunch of cool concepts make people confused.

For example, an autonomous driving company under a traditional car company has released a large generative model for autonomous driving, which is called "the first in the industry" to use this model to train autonomous driving.

An investor who has been paying attention to the smart car track for a long time asked an industry leader what he thinks of the model, and the other party replied with four words: "TM nonsense."

"It's just a PR act." The investor commented on Shentu.

Autonomous driving, will it be overthrown and restarted?

Driven by Tesla, coupled with the wave of AI emerging this year, the autonomous driving industry is gradually approaching the direction of large models, large computing power, and big data.

The impact of large models on autonomous driving is not yet drastic enough, but people with a keen sense of smell have shown an ambivalence.

Just like when Tesla used Transformer to convert multi-camera data from image space to BEV space, it did not hesitate to overthrow the original architecture and rewrite the algorithm. The application of large models now may also mean that the original automatic driving algorithm will be overthrown and restarted.

He Zhiqiang believes that large models will have a huge impact on autonomous driving. In the past, many small models were used for autonomous driving, but now it has become a large model, and it may be necessary to do it all over again. The autonomous driving industry will be reshuffled.

Zhao Dongxiang, director of autonomous driving at an AI chip company, told Shentu that the overall end-to-end change is tantamount to doing it all over again.

Shuffling is an opportunity for new entrants and a threat for leaders. The story of overtaking on a curve often happens in a period of rapid technological change. In the era of rapid technology, the more investment in the old route, the greater the sunk cost may be, and the more difficult it is to turn around. For OEMs or autonomous driving companies, to embrace a new technology, not only the effect but also the cost must be considered.

Zhao Dongxiang said that as far as the current stage is concerned, it is meaningless to change the technical route of automatic driving. "Now the industry's technical capabilities are not bad. Everyone has spent so much money and done it for so long. If there is no substantial improvement, there is no motivation to change."

On AI DAY at the end of last year, Tesla upgraded the BEV to an occupancy network (occupancy network), and the generalization ability was further improved. By occupying the network, Tesla's autopilot perception system can judge whether it needs to avoid it without knowing what the object it sees, thus solving more long-tail problems.

No matter what kind of technical route, it is now undergoing rapid changes and iterations. The small models in the past may be replaced by large models, and today's large models may also be replaced by some new species in the future.

But in any case, the practice of rubbing hot spots and making gimmicks is not conducive to technological progress. "It's a bad habit to follow the heat, and it is useful to make products in a down-to-earth manner." Zhao Dongxiang said.

The real "king bomb" of autonomous driving is far from coming. What we need to do is to remain in awe of every round of technological change. The mythical GPT cannot build your dream car, but at least, changes have taken place.

View Original
The content is for reference only, not a solicitation or offer. No investment, tax, or legal advice provided. See Disclaimer for more risks disclosure.
  • Reward
  • Comment
  • Share
Comment
0/400
No comments