A reported feature from iFLYTEK’s Hefei headquarters—through products, platforms, and language constraints, tracing how “voice intelligence” tries to become infrastructure.
This reported feature comes from my technology column for Weekend Pictorial. I visited iFLYTEK’s headquarters in Hefei to understand how speech AI moved from demos into daily products—what actually scales, what doesn’t, and why.
Excerpt 1 · What “landing” looks like: products, demos, and an ecosystem on display.
Inside the showroom, two screens mounted by the entrance display real-time data from the AIUI open platform: the total number of participating developers, their geographic distribution, and the number of products built on the system. On a nearby screen, a voice-synthesis video plays on a loop. In it, Donald Trump speaks first in English and then, improbably, in Chinese: “iFLYTEK is amazing.” Trump, of course, does not speak Chinese. The clip is a small promotional trick, made possible by the company’s speech synthesis technology. Further inside, a small black box sits on a table: the iFLYTEK Translator, once demonstrated by founder Liu Qingfeng during China’s Two Sessions. Its relatively accurate real-time, two-way translation has made it popular among Chinese travelers abroad. Nearby are two child-oriented companion robots, Alpha Egg Big and Alpha Egg Small, both round, brightly colored, and designed to be immediately endearing. They blink cheerfully, waiting to be awakened by a voice command. Another familiar product on the table is the DingDong smart speaker, launched jointly by iFLYTEK and JD.com in 2015—the first domestically developed smart speaker in China. The most eye-catching objects in the room, however, are two human-height robots standing at the center of the exhibition space. One, named “Xiaotu,” is designed for government service scenarios; the other, “Xiaoyi,” has recently passed China’s national medical licensing examination with a score 96 points above the passing line, becoming the first robot to do so. As an “AI medical assistant,” it can help doctors capture and analyze patient information and support clinical decision-making. At the far end of the showroom, a car cockpit draws the attention of many visitors. It demonstrates how iFLYTEK’s voice technology is deployed in in-vehicle systems—a domain where voice AI has already found widespread application. Automakers including Audi, BMW, Mercedes-Benz, General Motors, SAIC, and Chery have partnered with iFLYTEK to bring voice-enabled services to Chinese drivers.
Excerpt 2 · The technical ceiling: accuracy, dialects, and why each extra “1%” is hard.
According to Zhao, iFLYTEK’s Mandarin speech recognition accuracy has now reached 98 percent. When the company first launched its open platform in 2010, that figure hovered between 60 and 70 percent. Through years of iterative training and optimization, it reached 97 percent last year. “The closer you get to the ceiling, the harder every step becomes,” Zhao said. Raising accuracy from 70 to 98 percent requires reducing the relative error rate by about 30 percent; pushing it from 98 to 99 percent, however, demands a further reduction of more than 50 percent. Dialect recognition presents an even steeper challenge. In version 3.0 of the AIUI platform, iFLYTEK opened support for 23 Chinese dialects. More than half now achieve recognition rates above 90 percent. In speech recognition, Zhao explained, the size of the user base matters: the larger the population using a language, the faster the system can be optimized through data. This is why Mandarin—spoken daily by more than 200 million active users on the platform—has improved so rapidly. Dialects with larger speaker populations, such as Cantonese and Sichuanese, also tend to reach higher accuracy levels than those with fewer speakers.