Arm and Alibaba Collaborate: KleidiAI Powers MNN for a Leap in On-Device Multimodal AI Performance

Arm and Alibaba Collaborate: KleidiAI Powers MNN for a Leap in On-Device Multimodal AI PerformanceArm Holdings plc and Alibaba's Taobao Tmall Group recently announced a significant partnership integrating Arm Kleidi AI into Alibaba's lightweight deep learning framework, MNN. This integration dramatically improves the performance of multimodal AI workloads running on mobile devices powered by Arm CPUs

Arm and Alibaba Collaborate: KleidiAI Powers MNN for a Leap in On-Device Multimodal AI Performance

Arm Holdings plc and Alibaba's Taobao Tmall Group recently announced a significant partnership integrating Arm Kleidi AI into Alibaba's lightweight deep learning framework, MNN. This integration dramatically improves the performance of multimodal AI workloads running on mobile devices powered by Arm CPUs. The collaboration centers on optimizing Alibaba's instruction-tuned Qwen2-VL-2B-Instruct model using Arm Kleidi AI, enabling efficient on-device image understanding, text-to-image inference, and cross-lingual multimodal generation.

This collaboration was showcased at Mobile World Congress (MWC) 2025, demonstrating the model's ability to understand various combinations of visual and textual inputs on a smartphone equipped with MediaTek's Dimensity 9400 system-on-chip (SoC), accurately summarizing and describing image content. This demonstration vividly showcased the potential of on-device multimodal AI and the effectiveness of Arm Kleidi AI in optimizing model performance.

Arm and Alibaba Collaborate: KleidiAI Powers MNN for a Leap in On-Device Multimodal AI Performance

Stefan Rosinger, Senior Director of Product Management, Arm Client Line of Business, commented: "We are in the midst of an AI revolution, and the rise of multimodal AI models is particularly noteworthy. These models can process and understand various data types, including text, images, audio, video, and sensor data. However, deploying these advanced multimodal models on-device presents significant challenges due to hardware power limitations, memory constraints, and the complexity of handling multiple data types."

To address these challenges, Arm Kleidi AI was developed. It's a lightweight, high-performance, open-source Arm routine designed for AI acceleration. Kleidi AI is integrated into the latest versions of leading on-device AI frameworks, including ExecuTorch, Llama.cpp, LiteRT (via XNNPACK), and MediaPipe, automatically providing significant performance improvements to millions of developers without extra effort. This technology seamlessly optimizes all AI inference workloads running on Arm CPUs, significantly lowering the barrier to entry and boosting developer efficiency.

By integrating Kleidi AI with MNN, Arm and the MNN team conducted performance tests on the Qwen2-VL-2B-Instruct model. Results showed significant improvements in both speed and responsiveness in key on-device multimodal AI applications. This translates to a significantly enhanced user experience for Alibaba's numerous customer-centric applications.

Specifically, performance improvements were observed in two key areas: a 57% improvement in model pre-filling performance and a 28% improvement in decoding performance. Model pre-filling refers to the processing of prompt inputs before the AI model generates a response, while decoding is the process of generating text from the AI model after processing the prompt. Improvements in these crucial steps directly lead to a faster response time and smoother user interaction.

Furthermore, Kleidi AI integration further promotes efficient on-device AI workload processing by reducing the overall computational cost of multimodal workloads. This is crucial for extending battery life and enhancing user experience. Efficient computational resource utilization is paramount on resource-constrained mobile devices, a challenge effectively addressed by Kleidi AI integration.

This Arm and Alibaba collaboration delivers not only technological breakthroughs but also tangible benefits for developers. Millions of developers using popular AI frameworks, including MNN, for applications and workloads will experience performance and efficiency gains in edge device applications. This will accelerate the adoption of on-device AI applications, driving AI technology deployment across various sectors and providing users with a more intelligent and convenient experience.

The open-source nature of Arm Kleidi AI also significantly contributes to the AI community's growth. Its permissive license encourages developers to freely use, improve, and share the technology, fostering innovation and collaboration to advance on-device AI. This open ecosystem further lowers the barrier to AI development and attracts more developers to on-device AI development and application.

In conclusion, the Arm and Alibaba partnership marks a significant step forward for on-device multimodal AI. The integration of Arm Kleidi AI and the MNN framework delivers significant performance improvements, resulting in a better user experience and providing developers with more powerful tools and greater opportunities. This successful collaboration signifies the future direction of on-device AI towards greater efficiency and intelligence, unlocking new possibilities for various applications. As AI technology continues to evolve, on-device AI will become a crucial aspect of future technological advancements, and the collaboration between Arm and Alibaba will undoubtedly play a pivotal role in this evolution. This successful case study offers valuable experience and insights for other AI frameworks and hardware vendors, propelling industry progress. We can anticipate more similar collaborations in the future, collectively driving the rapid development and widespread adoption of on-device AI, not only enhancing user experiences but also revolutionizing various industries. The success of this collaboration isn't just a fusion of technologies; it's a powerful exploration of the future direction of AI, injecting new vitality and momentum into the entire industry. This collaborative model deserves emulation, fostering a more prosperous AI ecosystem.


Disclaimer: The content of this article is sourced from the internet. The copyright of the text, images, and other materials belongs to the original author. The platform reprints the materials for the purpose of conveying more information. The content of the article is for reference and learning only, and should not be used for commercial purposes. If it infringes on your legitimate rights and interests, please contact us promptly and we will handle it as soon as possible! We respect copyright and are committed to protecting it. Thank you for sharing.(Email:[email protected])