After building "large models," will "embodied intelligence" lead the next wave of AI?

03/21 2024 542

Senior robotics expert Eric Jang once predicted not long ago: "ChatGPT emerged overnight. I believe intelligent robotics technology will also do so."

Late at night on March 13th, a video of a humanoid robot began to spread widely.

In the video, Figure's humanoid robot can fully engage in fluent conversations with humans, understand human intentions, and simultaneously comprehend human natural language instructions to perform grasping and placement tasks, while explaining why it does so.

Behind it all is the large language model configured by OpenAI. As the OpenAI model also supports multimodal input, it can provide advanced visual and linguistic intelligence for Figure.

Figure itself was founded in 2022. From when OpenAI announced its involvement in collaboration with Figure, to when they jointly launched a robot capable of autonomous conversation and decision-making, it took only 13 days.

The development of embodied intelligence is clearly accelerating.

The evolution speed of embodied intelligence surpasses imagination

At last year's ITF World 2023 Semiconductor Conference, NVIDIA's founder and CEO, Jensen Huang, stated that the next wave of artificial intelligence will be "embodied intelligence," an intelligent system capable of understanding, reasoning, and interacting with the physical world. The integration of AI and robotics offers vast imaginative possibilities.

He also introduced NVIDIA's multimodal embodied intelligence system, Nvidia-VIMA, which can execute complex tasks, acquire concepts, understand boundaries, and even simulate physics under the guidance of visual text prompts, marking a significant advancement in AI capabilities.

Furthermore, at Tesla's 2023 Annual Shareholders Meeting, Elon Musk showcased a new model of the humanoid robot Optimus, which is nearly equivalent to an embodied intelligence robot.

Musk stated that humanoid robots will be Tesla's primary source of long-term value in the future, and he believes that products represented by embodied intelligence robots are poised to become the next wave of AI.

In 1950, Turing first introduced the concept of embodied intelligence in his paper "Computing Machinery and Intelligence."

Embodied AI refers to intelligent agents with a physical body and support for physical interaction, such as intelligent service robots and autonomous vehicles. Embodied intelligent robots refer to robots that can interact with the environment like humans, perceive, plan, make decisions, act, and execute tasks.

It encompasses almost all technologies in the field of artificial intelligence, including machine vision, natural language understanding, cognition and reasoning, robotics, game theory ethics, machine learning, etc., spanning multiple disciplinary directions and serving as a culmination of AI.

2023 was a year of explosive growth for generative AI, also known as the "year of robot awakening" by industry insiders. The integration of generative AI such as ChatGPT with the humanoid robot industry has ushered in the era of embodied intelligence.

Today, driven by the popularization of large language models and cutting-edge models like GPT-4, we seem to be witnessing a new era in the field of artificial intelligence, with human-computer communication becoming unprecedentedly smooth and seamless.

According to a report released by GGII in May 2023, it is predicted that by 2026, the penetration rate of humanoid robots in the global service robot market is expected to reach 3.5%, with a market size exceeding US$2 billion.

Leading scholars from various technology companies and academia are continuously pouring into research and product development in this field.

However, behind the prosperity and enthusiasm, potential difficulties also loom. Although models like ChatGPT have revolutionized the AI field, they still fail to fully meet public expectations in terms of understanding, associative thinking, and interaction capabilities.

This prompts us to reassess seemingly unimpeded progress while hoping that through relentless efforts, people can overcome the complex challenges facing the realization of true embodied intelligence.

When robots meet large models

In recent years, several domestic companies have released independently developed humanoid robots. Humanoid robots are the most complex category of robots, so what does "embodied intelligence" mean for robots? What changes will occur when large models are combined with robots?

In the state of "embodied intelligence," robots possess autonomous learning and planning capabilities, allowing them to respond autonomously and solve obstacles and difficulties quickly.

Currently, there are over 200 large models in China. In reality, humanoid robots serve as a carrier. When large models are combined with humanoid robots, the robots can help AI large models perceive the physical world and manipulate environmental context. The robots utilize multimodal perception to control their bodies and complete complex tasks.

In the first half of 2023, large language models represented by ChatGPT "broke out" explosively. The maturity of large language models and complex multimodal models combining vision and other sensors is a crucial prerequisite for realizing embodied intelligence in robots.

The most crucial point is that mature "AI large models" enable robots to shift from program execution orientation to task goal orientation, taking a solid step towards the development of general-purpose robots.

Put simply, the integration of "large models" and robots allows long-developed robots to truly develop a "brain."

The robot's "cloud brain" forms robotic intelligence through distributed algorithms, computing power, and big data in the cloud, edge, and terminal. It connects the cloud brain and the robot body through wireless secure high-speed networks such as 5G: the robot body completes various tasks on the "terminal" side.

The cloud brain utilizes advanced technologies such as artificial enhancement, multimodal fusion AI, and digital generation to enable robots to self-learn, continuously evolve, and grow intelligently.

The era of AI truly empowering various industries and intelligent robots entering every household is approaching. With technological breakthroughs leading to improved cost-effectiveness, the penetration rate of embodied intelligence is expected to accelerate in the future.

According to Goldman Sachs' predictions, under ideal conditions, if significant technological breakthroughs occur in robot hardware and software in the short term, achieving embodied intelligence while reducing costs by 20% annually, the global market space for humanoid robots is expected to reach US$154 billion in 2035, approaching the market space of smart cars in 2021, with a compound annual growth rate of 94% from 2025 to 2035.

In optimistic scenarios, the shipment volume of humanoid robots is expected to reach 1 million units in 2035, with a compound annual growth rate of the market space expected to reach 59% from 2025 to 2035.

The era of humanoid robots is arriving

On November 2, 2023, the Ministry of Industry and Information Technology issued the "Guiding Opinions on the Innovative Development of Humanoid Robots" (hereinafter referred to as the "Opinions"), pointing out the direction for the development of humanoid robots. The "Opinions" stated that by 2025, an innovative system for humanoid robots will be initially established, with breakthroughs in a batch of key technologies such as "brains, cerebellums, and limbs," ensuring the safe and effective supply of core components. The overall product will reach international advanced levels and achieve mass production.

On January 17, 2024, David Holz, the founder of AI research laboratory Midjourney, wrote in a social media post: "We have reason to expect that by 2040, there will be 1 billion humanoid robots on Earth. By 2060, there will be 100 billion humanoid robots in the world." This预示着预示着 the arrival of a new era for humanoid robots.

In recent years, the academic attention on embodied intelligence has continued to increase. At the Conference on Robot Learning (CoRL), the number of papers in the field of embodied intelligence has shown a rapid growth trend.

At the International Conference on Intelligent Robots and Systems (IROS) held in early 2023, embodied intelligence was also discussed in depth as an extremely important topic.

At the Humanoid Robot Technology and Industrial Development Forum of the World Robot Conference on August 18, 2023, Yao Qizhi, the winner of the 2000 Turing Award, academician of the Chinese Academy of Sciences, and Dean of the Institute for Interdisciplinary Information Sciences at Tsinghua University, pointed out that the future development of artificial general intelligence (AGI) requires embodied entities that interact with the real physical world to complete various tasks.

Only in this way can greater value be brought to the industry.

As China's social aging gradually deepens, labor shortages become increasingly prominent, while the total labor force in China's manufacturing industry declines and labor costs rise, making "machines replacing humans" an important trend.

Currently, the global deployment of industrial robots is growing steadily, and China has become the world's largest robot market. Humanoid robots, structurally similar to humans, are expected to cover and replace all original work scenarios requiring human labor in the future.

According to a report by CCID Consulting, although various humanoid robots are still in the early stages of prototype development, the potential technological changes they bring and the changes they make to certain production and living scenarios deserve close attention.

Humanoid robots have great development potential in fields such as manufacturing, space exploration, life service industries, and university research. It is expected that by 2025, humanoid robots will achieve breakthroughs in manufacturing scenario applications, with small-scale applications in electronics, automobiles, and other manufacturing environments.

In China, the field of intelligent manufacturing will become the first area where humanoid robots achieve large-scale applications. Humanoid robots will redefine workers in the era of artificial intelligence, focusing on three scenarios: industrial manufacturing, commercial services, and family companionship, freeing humans from repetitive labor.

The commercial service scenario is the fastest market for humanoid robot applications, while the home scenario is the most promising application market for humanoid robots.

Recently, American technology company NVIDIA announced the establishment of the General Embodied Agent Research Lab (GEAR).

Since last year, several domestic companies, including CETC 21, Zhiyuan Robotics, iFLYTEK, Xpeng Motors, and Fourier Intelligence, have successively released independently developed embodied intelligent robots, and several companies plan to achieve commercialization of embodied intelligence this year.

The industry generally believes that 2024 is expected to become the first year of commercialization for embodied intelligence.

The virtual world, in stark contrast to the real world, provides a more precise and controllable environment, enabling agents to engage in bolder and innovative behaviors.

This is not only an extension of human intelligence but also a stage for the birth and development of artificial general intelligence, providing an ideal testing ground and growth space for AI to surpass human intelligence levels.

Perhaps this is the underlying reason why major technology companies place high hopes on embodied intelligence and the virtual world.

This预示着预示着 a more intelligent and interconnected future is approaching.

Related Reading

AI Agent: The Next Frontier for Large Models

Will the "raging" ChatGPT be a new opportunity for intelligent customer service?

It's time for the "raging" ChatGPT to be regulated

What kind of imagination will large models bring to operating systems?

[Original report by Technology Cloud]

Please indicate "Technology Cloud Report" and attach the link to this article when reposting.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.