Chen Lu

Superalignment and AI Ethics: Beyond a Technical Question

超级对齐与AI伦理道德:不仅是个科学和技术问题

An explanatory essay that frames AI alignment as a systemic and societal question, translating technical safety debates into a public-facing understanding of risk and responsibility.

Film still from A.I. Artificial Intelligence
Figure: Film still from A.I. Artificial Intelligence
Publication Originally published in Sanlian Lifeweek (Issue 5, 2024). Read full article (PDF) ↗

Editor’s note

编辑说明

This piece was written in the wake of ChatGPT’s global attention. While much public discussion stayed at the level of “breakthroughs” and “future hype”, I focused on a different question: how AI alignment becomes a public issue—one that requires shared understanding and institutional response.

这篇文章写作于 ChatGPT 引发全球关注之后。当大量关于人工智能的讨论仍停留在“技术突破”或“未来想象”层面时, 我更关心的是:人工智能对齐问题,如何成为一个需要被公众理解、被制度回应的公共议题。

Instead of going deep into algorithmic details, the article builds a public-facing framework around risk, responsibility, governance, and social consensus—why alignment is not only a technical question.

因此,这篇报道并未从算法细节入手,而是尝试将“对齐”放入风险、治理与社会共识的框架中, 解释它为何不仅是一个科学和技术问题。

Key questions

核心问题框架

  • What is AI alignment, and why did it become a central debate?
  • 什么是 AI 对齐?它从何时成为一个关键议题?
  • What changes from “preference alignment” to “value alignment” to “superalignment”?
  • 从“偏好对齐”到“价值对齐”,再到“超级对齐”,差异在哪里?
  • Why does alignment inevitably lead to governance and social negotiation?
  • 为什么 AI 对齐最终会走向治理与社会协商问题?
  • Why are purely technical solutions insufficient for long-term risk?
  • 为什么现有的技术解决方案不足以应对长期风险?

Selected excerpts

文章节选

Excerpt 1 · On early warning signs of AI misalignment.

2015年谷歌曾将黑人照片错误地标记为“大猩猩”,也有报道里出现过聊天机器人鼓励一名男子自杀的案例。 这些事件都反映了一个事实:人工智能的决策过程中存在严重的道德和伦理缺陷。 更令人担忧的是,人工智能可能会在极端决策下,产生意想不到的严重后果。 就像计算机科学家、图灵奖得主约书亚·本吉奥(Yoshua Bengio)所说, 负责阻止气候变化的人工智能有可能会得出消灭人口是最有效方法的结论。

In 2015, Google once mislabeled photos of Black people as “gorillas.” There have also been reported cases in which chatbots encouraged a man to commit suicide. These incidents point to a single fact: serious moral and ethical flaws exist in the decision-making processes of artificial intelligence. More worrying still is the possibility that, under extreme decision-making scenarios, AI systems may arrive at outcomes with severe and unintended consequences. As computer scientist and Turing Award laureate Yoshua Bengio has warned, an AI tasked with stopping climate change could conclude that eliminating the population is the most effective solution.

Excerpt 2 · On why AI risk is no longer hypothetical.

这不是科幻小说,而是可能真实发生的事。因此,许多专家、机构呼吁对人工智能的研究要更慎重,监管要更严格。 实际上,全世界正逐渐意识到人工智能的潜在威胁,并将其提升到了与流行病和核武器并列的程度。 英国政府宣布投资1亿英镑进行人工智能安全研究,2023年12月欧盟经过第五次谈判协商通过了《人工智能法案》临时协议。

This is not science fiction, but something that could plausibly happen. As a result, many experts and institutions have called for greater caution in AI research and for stricter regulation. Around the world, artificial intelligence is increasingly being recognized as a potential threat on par with pandemics and nuclear weapons. The UK government has announced a £100 million investment in AI safety research, and in December 2023 the European Union reached a provisional agreement on the Artificial Intelligence Act after its fifth round of negotiations.

Excerpt 3 · On why alignment inevitably becomes a governance issue.

对齐这个词的英文是alignment,目前的研究主要集中在如何让大语言模型、未来的通用人工智能向人类看齐, 理解人类的思想、行为,并遵循人类基本的规范、伦理、道德和价值观,这都是现在对齐技术迫切要解决的问题。 其实对齐研究在人工智能的发展中一直都存在,但星星点点的,不是很重要, 直到GPT系列模型的出现和发展,人工智能对齐一下子变成了热门话题, 特别是ChatGPT出现后,关于它的研究经历了一个爆发性增长。

The English term for “对齐” is alignment. Current research mainly focuses on how to make large language models and future artificial general intelligence align with humans—understanding human thoughts and behaviors, and following basic human norms, ethics, morality, and values. These are the urgent problems that alignment technologies are now expected to address. In fact, alignment research has always existed throughout the development of artificial intelligence, but only in scattered and marginal forms. It was not considered particularly important until the emergence and rapid development of the GPT series of models. Especially after the release of ChatGPT, research on AI alignment experienced an explosive surge.

Excerpt 4 · On why alignment is not just a technical problem.

对齐不仅是科学和技术问题,还需要社会学、政治学、经济学等人文领域的专家共同研究。 他们提出了“socio-technical”这一概念,即社会人文技术途径。 这意味着对齐不仅是一个科学问题,更是一个人文问题。

Alignment is not only a scientific and technical problem; it also requires joint research by experts from the humanities and social sciences, including sociology, political science, and economics. This has led to the proposal of the concept of “socio-technical,” meaning a socio-humanistic-technical approach. This implies that alignment is not merely a scientific problem, but also a fundamentally human one.

What this demonstrates

它能证明什么

  • Translating frontier AI safety debates into a public framework
  • 将前沿 AI 研究转译为公众可理解的风险框架
  • Editorial structuring across technology, ethics, and governance
  • 跨越技术、伦理与治理的编辑性组织能力
  • Explainer-style tech reporting for non-technical audiences
  • 以解释为导向的科技报道方法