Chen Lu

Embodied Intelligence Enters the Physical World: Are We Ready?

具身智能进入物理世界,我们准备好了吗?

An interview-based explainer that uses Robotaxi and humanoid robots as the “first contact” moment—then pulls the discussion back to risk, traceability, and governance in real-world deployment.

Film still from A.I. Artificial Intelligence
Figure: On March 14, Apptronik’s humanoid robot Apollo performed with Czech dancer Yemi at the SXSW Conference and Festivals.
Publication Originally published in Sanlian Lifeweek (Issue 34, 2024). Read full article (PDF) ↗

Editor’s note

编辑说明

This piece treats “embodied AI” less as a slogan and more as a governance problem that arrives the moment systems touch people, streets, factories, and homes. The reporting logic is simple: start with what the public can see (Robotaxi, humanoids), then trace what must be built behind the curtain—standards, traceability, and accountability.

这篇报道并不把“具身智能”当作口号,而把它当作一类在进入街道、工厂与家庭时必然触发的治理问题。写作方法也很直接:从公众能看见的应用(Robotaxi、人形机器人)切入,再追问其背后的标准、可追溯机制与责任结构。

Key questions

核心问题框架

  • Why did autonomous driving become the first “mass-facing” embodied AI application?
  • 为什么自动驾驶会成为公众最先接触到的具身智能应用?
  • What exactly changes when AI moves from text to bodies, sensors, and actuators?
  • 当 AI 从文本走向身体、传感器与执行器,究竟改变了什么?
  • Where do the biggest risks live: in models, in systems, or in deployment contexts?
  • 最关键的风险究竟来自模型本身、系统结构,还是部署场景?
  • What kinds of traceability and standards are missing before “entering homes” becomes normal?
  • 在大规模进入家庭之前,哪些可追溯机制与标准仍是空白?

Selected excerpts

文章节选

Excerpt 1 · Why autonomous driving moved first: fewer degrees of freedom.

确实如此。自动驾驶技术之所以能快速推进,首先是因为它的操作相对简单,控制维度较低,比如方向盘、油门和刹车,加起来也只有三个自由度(统计学概念,指样本中独立或能自由变化的观测值个数)。而人形机器人复杂得多,一个灵巧的机械手就可能有20多个自由度,更别提腿、腰、头部、手臂等部位的关节了,总共可能有50多个自由度。这使得人形机器人在操作层面面临的挑战远大于自动驾驶。

That’s true. One reason autonomous driving has advanced so quickly is that its operation is relatively simple: the control dimensions are lower. A steering wheel, a throttle, and a brake add up to just three degrees of freedom. Humanoid robots are a different creature. A dexterous hand alone can involve more than twenty degrees of freedom, not to mention joints in the legs, waist, head, and arms—fifty or more in total. At the level of action, the difficulty curve steepens dramatically.

Excerpt 2 · What makes “embodied intelligence” different: a language-model “brain” meets physical control.

具身智能可以分为“大脑”和“小脑”两部分。大脑部分类似于大模型,如ChatGPT,能够实时判断并理解复杂任务,比如知道你饿了,就去冰箱拿三明治。这种常识性判断是传统控制论无法实现的。小脑则负责具体的操作和控制,如手脚的动作、拿起物品、开门等。

Embodied intelligence can be thought of as having two parts: a “brain” and a “cerebellum.” The brain is closer to a large model—something like ChatGPT—capable of making real-time judgments about complex tasks, the way a person would: you’re hungry, so you go to the fridge and take a sandwich. That kind of commonsense reasoning is not what traditional control theory was built to do. The cerebellum handles the concrete work—movement of hands and feet, picking things up, opening a door.

Excerpt 3 · When machines inhabit the same space as humans, risk turns physical and emotional.

机器人在与人对话时,不仅可能因价值观问题引发情感伤害;在物理空间中,它们甚至可能对人身安全造成威胁,最简单的比如踩到人。这些风险在基于文本的人工智能应用中尤为明显。现有的聊天机器人可以模拟情侣或朋友的角色,这种个性化互动可能让用户产生情感依赖。一旦这些机器人变得具身化,风险就更大了。

When a robot talks with a person, it can cause emotional harm through value conflicts; once it operates in physical space, it can also threaten bodily safety—something as blunt as stepping on someone. These risks are already visible in text-based AI. Today’s chatbots can simulate the role of a partner or a friend; that kind of tailored intimacy can foster emotional dependence. Give the same system a body, and the stakes rise with it.

Excerpt 4 · The missing infrastructure: definitions, sensors, and something like a black box.

设想一下,一个人形机器人在家中正要迈步,这时有人突然摔倒。如果机器人能意识到不该踩下去,那接下来该怎么处理?它是应该回退到几秒前,还是继续前进但小心跨过障碍?这些细节都需要有明确的定义。为了确保安全,机器人可能需要在脚上安装摄像头,以检测周围环境。这种思路可能会彻底改变行业的安全标准和技术发展方向。此外,机器人是否还需要类似“黑匣子”的设备,记录所有操作数据,并将这些数据上传到云端,由政府进行监管,也是需要考虑的问题。

Imagine a humanoid robot taking a step at home when someone suddenly falls. If the robot “knows” it shouldn’t step down, what happens next? Does it roll back a few seconds, or does it move forward while carefully clearing the obstacle? The details need to be defined. For safety, the robot may need cameras on its feet to read its immediate surroundings—an idea that could rewrite industry safety standards and steer technical development. And there’s the question of whether robots will need something like a black box: a device that records every operation and uploads the data to the cloud for regulatory oversight.

What this demonstrates

它能证明什么

  • Turning emerging tech into a public-facing risk-and-governance framework
  • 把新兴技术转译为公众可理解的风险与治理框架
  • Interview-driven explainers with strong systems logic (deployment → standards → accountability)
  • 以采访驱动的解释型写作:从部署场景推导到标准与责任结构
  • Clear editorial structuring across industry context, technical constraints, and social impact
  • 在产业语境、技术约束与社会影响之间做清晰的编辑性组织