Why China’s AI Model-Chip Alliance Could Redefine the Future of Tech

6

On July 25, during this year’s World AI Conference – International Services Shanghai, StepFun unveiled Step 3, its cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. StepFun didn’t just showcase model performance—it delivered a surprise: Step 3 runs at up to 300% the inference efficiency of DeepSeek-R1 on domestic chips. This performance, combined with the simultaneous launch of a Model-Chip Ecosystem Innovation Alliance with Infinigence AI, SiliconFlow, Huawei Technologies’ Ascend computing business unit, MetaX, Biren Technology, Enflame, Iluvatar Corex, Cambricon Technologies and Moore Threads, signals a bold step toward full-stack AI infrastructure rooted in Chinese hardware.

Rather than waiting for chips to catch up with models, StepFun flipped the script—designing models around the constraints of domestic chips from the start. Co-founder Zhu Yibo explained this shift as necessary, as chip design cycles lag far behind the rapid evolution of AI models. The company’s Multi-Factor Attention mechanism, released earlier this year, quietly reduced inference cache requirements by 93.7%, making Step 3 far more compatible with local hardware than rivals like DeepSeek.

While DeepSeek has dominated headlines with its early success, it has stumbled recently. Despite plans to release its R2 model in May, delays continue, likely due to NVIDIA chip shortages and export controls. DeepSeek’s reliance on NVIDIA’s GPU ecosystem—bolstered by stockpiles amassed in 2021—has become a vulnerability. Meanwhile, Chinese alternatives like Huawei’s Ascend 910B are proving increasingly viable, even outperforming DeepSeek-R1 in some tests.

At the core of StepFun’s approach is a philosophy that models and chips must evolve as one system. Step 3 isn’t just faster—it’s designed for Chinese chip strengths and weaknesses, such as limited HBM performance and manufacturing constraints. Performance demos show Step 3 running more efficiently on Ascend 910B than even Huawei’s own Pangu Pro MoE model, despite having more than double the active parameters.

The alliance StepFun launched is more than symbolic. Its goal is to align chip and model development timelines and share early-stage designs across companies. This level of collaboration could give Jiaoyue first-mover advantages in R&D and bring deeper integration between AI and hardware than previously seen in China.

Despite these advances, training models on Chinese chips remains a major challenge. The U.S. still leads in large-scale GPU clusters, with domestic training relying heavily on NVIDIA infrastructure. Projects like Feixing-2, a joint effort by iFlyTek and Huawei, represent progress but still lag in total computing power and stability. Beyond hardware, rebuilding toolchains from scratch to match chip architecture is a costly and expertise-heavy process. The dominance of NVIDIA’s CUDA remains a formidable advantage in the training phase.

But the race isn’t over. If Chinese players can lead the next technological paradigm, there’s room to leap ahead. That opportunity is likely to emerge from multimodal AI. While multimodal tools are widespread, the real GPT-4 moment for multimodal hasn’t arrived—current systems still lack deep integration of vision, language, audio, and reasoning. StepFun is betting on this future, and early results suggest it’s paying off.

The company recently announced projected annual revenue of €120 million, making it one of the few large-model players to reveal commercial performance. It has released over a dozen multimodal models in a year, spanning speech, image generation, video, and music. Its integrated model, Step 3o Vision, and its end-to-end speech model, Step-Audio 2, represent the next wave of intelligent applications. From glare-resistant menu recognition to smart cockpits co-developed with Geely, StepFun is turning multimodal research into usable, profitable products. Its intelligent terminal agents are already embedded in over half of China’s top smartphone brands, helping define the user interface of next-generation AI.

This practical focus is helping StepFun create a flywheel effect—more deployment leads to more data, which improves models, which draws more users. Industry leaders agree: aligning chip makers, model builders, and end-users through unified standards is essential to reducing costs and accelerating progress.

One last detail stands out: of all companies in the model-chip alliance, half are based in Shanghai. Long overshadowed in the internet era, Shanghai is now leading China’s AI transformation. The city’s strength lies in hardware-software synergy. With top wafer fabs, HBM packaging capacity, and a robust application ecosystem, Shanghai is ideally positioned for AI industrialization. It’s home to over 24,000 AI companies. The local government is doubling down—state capital is flowing into early-stage AI investments, including StepFun’s upcoming financing round.

Source: Guancha, Shanghai Gov, Souhu, CGTN, InfoTechLead