Arm collaborates with Alibaba to accelerate the end-to-end multimodal AI experience through the integration of KleidiAI and Tongyi Qianwen model

Arm Holdings Limited (NASDAQ: ARM, hereinafter referred to as “Arm”) today announced another new collaboration with Alibaba’s Taobao Group’s lightweight deep learning framework MNN. Through the integration of Arm KleidiAI, both parties have successfully enabled multimodal artificial intelligence (AI) workloads to run on mobile devices equipped with Arm CPUs using Alibaba’s instruction adjusted Tongyi Qianwen Qwen2-VL-2B-Struct model. This version of the Tongyi Qianwen model is designed specifically for image understanding, text to image inference, and multimodal generation across multiple languages on end devices. This collaboration significantly improves the performance of end-to-end multimodal AI workloads and brings a brand new user experience.

Arm collaborates with Alibaba to accelerate the end-to-end multimodal AI experience through the integration of KleidiAI and Tongyi Qianwen model插图

Stefan Rosinger, Senior Director of Product Management at Arm Terminal Business Unit, said, “We are currently in the midst of the AI revolution and have witnessed firsthand the rise of multimodal AI models. These models are capable of processing and understanding a variety of data types, including text, images, audio, video, and sensor data. However, due to the power and memory constraints of the hardware itself, as well as the complexity of processing multiple data types simultaneously, deploying these advanced multimodal models on end devices is facing significant challenges

Arm Kleidi is an ideal solution to address these challenges, providing seamless performance optimization for all AI inference workloads running on Arm CPUs. KleidiAI is a lightweight and high-performance open-source Arm routine designed specifically for AI acceleration. It has been integrated into the latest versions of mainstream end-to-end AI frameworks, including ExecutuTorch, Llama.cpp, LiteRT (via XNNPACK), and MediaPipe, allowing millions of developers to automatically achieve significant improvements in AI performance without the need for additional actions.

Accelerate the response time of end side multimodal AI use cases

Through the integration of KleidiAI and MNN, the Arm and MNN teams measured the acceleration performance of the Qwen2-VL-2B-Instruct model, and the results showed that its running and response speed were improved in key AI multimodal application scenarios on the end side. This improvement can bring a better user experience to many customer-centric applications under Alibaba.

The improvement in response speed for these use cases is due to a 57% increase in model pre filling (referring to the AI model processing prompt input before generating a response) performance and a 28% increase in decoding (referring to the process of generating text from the AI model after processing prompt words) performance. In addition, KleidiAI integration can further promote efficient processing of AI workloads on end devices by reducing the overall computational cost of multimodal workloads. Millions of developers who use popular AI frameworks, including MNN frameworks, to run applications and workloads can enjoy these performance and efficiency improvements in applications and workloads targeting edge devices.

Xu Dong, General Manager of Alibaba Cloud’s Tongyi Big Model Business, said, “We are very pleased to see the deep technical cooperation between Tongyi Qianwen Big Model and Arm KleidiAI and MNN teams. Through the integration and acceleration optimization of MNN end-to-end inference framework and Arm KleidiAI, we have successfully achieved a significant reduction in big model inference latency and a significant improvement in energy efficiency ratio. This groundbreaking collaboration not only fully validates the practical potential of big models in mobile devices, but also enables users to experience the inclusive value of next-generation AI at their fingertips. We look forward to the three parties continuing to work together, breaking through the boundaries of computing power through technological innovation, and jointly opening a new chapter in end-to-end intelligence

Jiang Xiaotang, the head of business technology MNN at Alibaba Taotian Group, said, “This deep technical integration between MNN inference framework and Arm KleidiAI has achieved a new breakthrough in end-to-end large model acceleration. Through our joint optimization of the underlying architecture, the end-to-end inference efficiency of the Tongyi large model has been significantly improved, successfully crossing the technological gap between limited computing power and complex AI capabilities. This achievement is not only the crystallization of the MNN team’s technological breakthroughs, but also a vivid interpretation of the power of cross-border collaboration. We look forward to continuing to work together to deeply cultivate the end-to-end computing ecosystem in the future, so that every mobile terminal can carry a smoother, more efficient, and more natural AI experience

KleidiAI integration demonstration at MWC

At this year’s Mobile World Congress (MWC), Arm showcased the results of this collaboration at the event booth (Hall 2, Booth I60), highlighting how models understand various combinations of visual and textual inputs and extract and explain the content in images. This demonstration was conducted on smartphones equipped with MediaTek Dimensity 9400 Mobile System on Chip (SoC), including the vivo X200 series.

A leap towards achieving multimodal AI experience

The integration of KleidiAI and the MNN framework supported by Alibaba’s Tongyi Qianwen model has successfully brought significant user experience improvements to multimodal AI workloads running on the Arm CPU side. These excellent experiences have now been applied to mobile devices, and many customer-oriented applications have also benefited from the various advantages brought by KleidiAI. Looking ahead, KleidiAI’s seamless optimization for AI workloads will continue to empower developers to provide more complex multimodal experiences on edge devices. This will lay the foundation for the next wave of intelligent computing and take an exciting step forward in the continuous evolution of AI.

wKhk6GfSm-qADSe0AAJBFq3j6ec741
wKhk6Wfg_jaAbqfWAAL9o2bsp0s326
wKhk6WfaHVaAJkOdAAFL9IWLuD8143
wKhk6GfaG3qAVounAAO5QDLOLGk312
我们将24小时内回复。
2025-07-12 20:36:33
您好,有任何疑问请与我们联系!
您的工单我们已经收到,我们将会尽快跟您联系!
Cancel

选择聊天工具: