Superalignment · Chen Lu

Publication Originally published in Sanlian Lifeweek (Issue 5, 2024). Read full article (PDF) ↗

Editor’s note

This piece was written in the wake of ChatGPT’s global attention. While much public discussion stayed at the level of “breakthroughs” and “future hype”, I focused on a different question: how AI alignment becomes a public issue—one that requires shared understanding and institutional response.

Instead of going deep into algorithmic details, the article builds a public-facing framework around risk, responsibility, governance, and social consensus—why alignment is not only a technical question.

Key questions

What is AI alignment, and why did it become a central debate?
什么是 AI 对齐？它从何时成为一个关键议题？
What changes from “preference alignment” to “value alignment” to “superalignment”?
从“偏好对齐”到“价值对齐”，再到“超级对齐”，差异在哪里？
Why does alignment inevitably lead to governance and social negotiation?
为什么 AI 对齐最终会走向治理与社会协商问题？
Why are purely technical solutions insufficient for long-term risk?
为什么现有的技术解决方案不足以应对长期风险？

Selected excerpts

Excerpt 1 · On early warning signs of AI misalignment.

节选 1 · 人工智能对齐失效的早期警示：

2015年谷歌曾将黑人照片错误地标记为“大猩猩”，也有报道里出现过聊天机器人鼓励一名男子自杀的案例。这些事件都反映了一个事实：人工智能的决策过程中存在严重的道德和伦理缺陷。更令人担忧的是，人工智能可能会在极端决策下，产生意想不到的严重后果。就像计算机科学家、图灵奖得主约书亚·本吉奥（Yoshua Bengio）所说，负责阻止气候变化的人工智能有可能会得出消灭人口是最有效方法的结论。

In 2015, Google once mislabeled photos of Black people as “gorillas.” There have also been reported cases in which chatbots encouraged a man to commit suicide. These incidents point to a single fact: serious moral and ethical flaws exist in the decision-making processes of artificial intelligence. More worrying still is the possibility that, under extreme decision-making scenarios, AI systems may arrive at outcomes with severe and unintended consequences. As computer scientist and Turing Award laureate Yoshua Bengio has warned, an AI tasked with stopping climate change could conclude that eliminating the population is the most effective solution.

Excerpt 2 · On why AI risk is no longer hypothetical.

节选 2 · 人工智能风险为何不再只是科幻假设.

这不是科幻小说，而是可能真实发生的事。因此，许多专家、机构呼吁对人工智能的研究要更慎重，监管要更严格。实际上，全世界正逐渐意识到人工智能的潜在威胁，并将其提升到了与流行病和核武器并列的程度。英国政府宣布投资1亿英镑进行人工智能安全研究，2023年12月欧盟经过第五次谈判协商通过了《人工智能法案》临时协议。

This is not science fiction, but something that could plausibly happen. As a result, many experts and institutions have called for greater caution in AI research and for stricter regulation. Around the world, artificial intelligence is increasingly being recognized as a potential threat on par with pandemics and nuclear weapons. The UK government has announced a £100 million investment in AI safety research, and in December 2023 the European Union reached a provisional agreement on the Artificial Intelligence Act after its fifth round of negotiations.

Excerpt 3 · On why alignment inevitably becomes a governance issue.

节选 3 · 对齐为何必然走向治理问题.

对齐这个词的英文是alignment，目前的研究主要集中在如何让大语言模型、未来的通用人工智能向人类看齐，理解人类的思想、行为，并遵循人类基本的规范、伦理、道德和价值观，这都是现在对齐技术迫切要解决的问题。其实对齐研究在人工智能的发展中一直都存在，但星星点点的，不是很重要，直到GPT系列模型的出现和发展，人工智能对齐一下子变成了热门话题，特别是ChatGPT出现后，关于它的研究经历了一个爆发性增长。

The English term for “对齐” is alignment. Current research mainly focuses on how to make large language models and future artificial general intelligence align with humans—understanding human thoughts and behaviors, and following basic human norms, ethics, morality, and values. These are the urgent problems that alignment technologies are now expected to address. In fact, alignment research has always existed throughout the development of artificial intelligence, but only in scattered and marginal forms. It was not considered particularly important until the emergence and rapid development of the GPT series of models. Especially after the release of ChatGPT, research on AI alignment experienced an explosive surge.

Excerpt 4 · On why alignment is not just a technical problem.

节选 4 · 关于对齐为何不仅是科学和技术问题。

对齐不仅是科学和技术问题，还需要社会学、政治学、经济学等人文领域的专家共同研究。他们提出了“socio-technical”这一概念，即社会人文技术途径。这意味着对齐不仅是一个科学问题，更是一个人文问题。

Alignment is not only a scientific and technical problem; it also requires joint research by experts from the humanities and social sciences, including sociology, political science, and economics. This has led to the proposal of the concept of “socio-technical,” meaning a socio-humanistic-technical approach. This implies that alignment is not merely a scientific problem, but also a fundamentally human one.

What this demonstrates

Translating frontier AI safety debates into a public framework
将前沿 AI 研究转译为公众可理解的风险框架
Editorial structuring across technology, ethics, and governance
跨越技术、伦理与治理的编辑性组织能力
Explainer-style tech reporting for non-technical audiences
以解释为导向的科技报道方法

Superalignment and AI Ethics: Beyond a Technical Question

超级对齐与AI伦理道德：不仅是个科学和技术问题

Editor’s note

编辑说明

Key questions

核心问题框架

Selected excerpts

文章节选

What this demonstrates

它能证明什么