Chen Lu

iFLYTEK: Practicing Voice AI for Everyday Life in China

科大讯飞,中国语音智能生活的实践者

A reported feature from iFLYTEK’s Hefei headquarters—through products, platforms, and language constraints, tracing how “voice intelligence” tries to become infrastructure.

A man trying on a VR headset
Figure: Robot independently developed by iFLYTEK
Publication Originally published in Modern Weekly (June 16, 2018, Issue 24). Read full article (PDF) ↗

Editor’s note

编辑说明

This reported feature comes from my technology column for Weekend Pictorial. I visited iFLYTEK’s headquarters in Hefei to understand how speech AI moved from demos into daily products—what actually scales, what doesn’t, and why.

本文出自我为《周末画报》撰写的科技专栏。我走访科大讯飞合肥总部,试图从真实产品与研发现场出发,理解语音 AI 如何从技术展示走向日常应用:哪些场景真正“跑得起来”,哪些环节仍受限,以及背后的原因。

Key questions

核心问题框架

  • Which “strong-need” scenarios make speech AI scale first?
  • 语音 AI 最先在哪些“强刚需”场景规模化?
  • What are the technical bottlenecks unique to Chinese speech recognition?
  • 中文语音识别有哪些独特难点与瓶颈?
  • How do latency, content ecosystems, and personalization shape home use?
  • 响应速度、内容生态与个性化如何决定家庭场景体验?

Selected excerpts

文章节选

Excerpt 1 · What “landing” looks like: products, demos, and an ecosystem on display.

进入展厅,悬挂在大门一侧的两块显示屏上实时显示着当前AIUI 开放平台参与的开发者总人数、分布地区和产品数量等信息;它旁边的另一块荧幕上则正播放着一段特朗普的语音合成视频,在这个视频中,特朗普先用英文,再用中文说道,“科大讯飞真是太棒了”,但事实上,特朗普并不会讲中文,这是该公司利用其技术玩的一个宣传小花招;往前走,被放置在桌上的一个小小黑匣子,是刘庆峰曾在两会期间演示过的讯飞晓译翻译机,其准确率较高的实时双向翻译功能赢得了不少国外出行者的喜爱;两个造型十分可爱、颇为吸引眼球的阿尔法大蛋、小蛋是科大讯飞推出的儿童智能陪护型机器人,它们正闪烁着笑脸等待用户的唤醒;桌上最为人熟悉的另一款产品则是2015 年科大讯飞联合京东推出的叮咚智能音箱,这也是中国第一款自主研发的智能语音音箱;另外两个非常引人注目的是站立于展厅中的两个与人等高的机器人,其中一台名为“小途”的星途机器人能够为人们政务服务,而另一个机器人“晓医”则在不久前刚刚通过了国家医疗执照考试,取得了超出标准分数96 分的成绩,成为历史上第一个取得这种成就的机器人,可以作为“智医助理”帮助医生捕捉和分析患者信息,提供人工智能辅助诊疗;此外,一辆受到众多男性参观者喜爱的汽车驾驶舱则展示了科大讯飞语音技术在车载系统中的运用,智能车载是如今智能语音技术被广泛应用的一个领域,奥迪、宝马、奔驰、通用、上汽和奇瑞等国内外汽车制造厂商已经和科大讯飞进行合作,将为中国用户提供便捷的智能服务。

Inside the showroom, two screens mounted by the entrance display real-time data from the AIUI open platform: the total number of participating developers, their geographic distribution, and the number of products built on the system. On a nearby screen, a voice-synthesis video plays on a loop. In it, Donald Trump speaks first in English and then, improbably, in Chinese: “iFLYTEK is amazing.” Trump, of course, does not speak Chinese. The clip is a small promotional trick, made possible by the company’s speech synthesis technology. Further inside, a small black box sits on a table: the iFLYTEK Translator, once demonstrated by founder Liu Qingfeng during China’s Two Sessions. Its relatively accurate real-time, two-way translation has made it popular among Chinese travelers abroad. Nearby are two child-oriented companion robots, Alpha Egg Big and Alpha Egg Small, both round, brightly colored, and designed to be immediately endearing. They blink cheerfully, waiting to be awakened by a voice command. Another familiar product on the table is the DingDong smart speaker, launched jointly by iFLYTEK and JD.com in 2015—the first domestically developed smart speaker in China. The most eye-catching objects in the room, however, are two human-height robots standing at the center of the exhibition space. One, named “Xiaotu,” is designed for government service scenarios; the other, “Xiaoyi,” has recently passed China’s national medical licensing examination with a score 96 points above the passing line, becoming the first robot to do so. As an “AI medical assistant,” it can help doctors capture and analyze patient information and support clinical decision-making. At the far end of the showroom, a car cockpit draws the attention of many visitors. It demonstrates how iFLYTEK’s voice technology is deployed in in-vehicle systems—a domain where voice AI has already found widespread application. Automakers including Audi, BMW, Mercedes-Benz, General Motors, SAIC, and Chery have partnered with iFLYTEK to bring voice-enabled services to Chinese drivers.

Excerpt 2 · The technical ceiling: accuracy, dialects, and why each extra “1%” is hard.

赵艳军介绍道,到今年,科大讯飞普通话的识别率已经可以达到98%,而这个数值在2010 年刚发布讯飞开放平台时仅为60% 至70%,经过不断的迭代训练以及优化,去年达到97%。“越到后面要提高识别率就越困难,”赵艳军解释道,将识别率从70% 提高至98%,其相对错误率需要下降30%,但如果要从98% 提高到99%,它的相对错误率则要下降50% 以上。针对方言问题,AIUI 开放平台在3.0 新版本中为用户开放了23种方言的识别技术,其中超过一半的识别率可以达到90% 以上。训练机器进行语音识别时,某种语言覆盖的用户人群所产生数据样本越大,优化迭代速度就越快,这就是为什么拥有每天超过2 亿活跃用户的普通话,其识别率能够快速提高。方言中,像粤语和四川话这些人群覆盖度比较大,识别率也会相对较高。

According to Zhao, iFLYTEK’s Mandarin speech recognition accuracy has now reached 98 percent. When the company first launched its open platform in 2010, that figure hovered between 60 and 70 percent. Through years of iterative training and optimization, it reached 97 percent last year. “The closer you get to the ceiling, the harder every step becomes,” Zhao said. Raising accuracy from 70 to 98 percent requires reducing the relative error rate by about 30 percent; pushing it from 98 to 99 percent, however, demands a further reduction of more than 50 percent. Dialect recognition presents an even steeper challenge. In version 3.0 of the AIUI platform, iFLYTEK opened support for 23 Chinese dialects. More than half now achieve recognition rates above 90 percent. In speech recognition, Zhao explained, the size of the user base matters: the larger the population using a language, the faster the system can be optimized through data. This is why Mandarin—spoken daily by more than 200 million active users on the platform—has improved so rapidly. Dialects with larger speaker populations, such as Cantonese and Sichuanese, also tend to reach higher accuracy levels than those with fewer speakers.

What this demonstrates

它能证明什么

  • Reported, product-facing tech writing grounded in field observation
  • 基于现场与产品细节的应用型科技写作
  • Explaining technical constraints (accuracy, dialects, latency) without losing readability
  • 把识别率、方言与延迟等技术约束写得清楚且可读
  • Connecting platform ecosystems (developers, content partners) to real adoption paths
  • 将平台生态(开发者与内容合作)与真实落地路径连接起来