Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought
作者: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu, Jihong Zhang, Jinbao Xue, Jun Xia, Junqiang Zheng, Kai Liu, Kai Zhang, Kai Zheng, Kejiao Li, Keyao Wang, Lan Jiang, Lixin Liu, Lulu Wu, Mengyuan Huang, Peijie Yu, Peiqi Wang, Qian Wang, Qianbiao Xiang, Qibin Liu, Qingfeng Sun, Richard Guo, Ruobing Xie, Saiyong Yang, Shaohua Chen, Shihui Hu, Shuai Li, Shuaipeng Li, Shuang Chen, Suncong Zheng, Tao Yang, Tian Zhang, Tinghao Yu, Weidong Han, Weijie Liu, Weijin Zhou, Weikang Wang, Wesleye Chen, Xiao Feng, Xiaoqin Ren, Xingwu Sun, Xiong Kuang, Xuemeng Huang, Xun Cao, Yanfeng Chen, Yang Du, Zhen Yang, Yangyu Tao, Yaping Deng, Yi Shen, Yigeng Hong, Yiqi Chen, Yiqing Huang, Yuchi Deng, Yue Mao, Yulong Wang, Yuyuan Zeng, Zenan Xu, Zhanhui Kang, Zhe Zhao, ZhenXiang Yan, Zheng Fang, Zhichao Hu, Zhongzhi Chen, Zhuoyu Li, Zongwei Li, Alex Yan, Ande Liang, Baitong Liu, Beiping Pan, Bin Xing, Binghong Wu, Bingxin Qu, Bolin Ni, Boyu Wu, Chen Li, Cheng Jiang, Cheng Zhang, Chengjun Liu, Chengxu Yang, Chengzhong Xu, Chiyu Wang, Chong Zha, Daisy Yi, Di Wang, Fanyang Lu, Fei Chen, Feifei Liu, Feng Zheng, Guanghua Yu, Guiyang Li, Guohua Wang, Haisheng Lin, Han Liu, Han Wang, Hao Fei, Hao Lu, Haoqing Jiang, Haoran Sun, Haotian Zhu, Huangjin Dai, Huankui Chen, Huawen Feng, Huihui Cai, Huxin Peng, Jackson Lv, Jiacheng Shi, Jiahao Bu, Jianbo Li, Jianglu Hu, Jiangtao Guan, Jianing Xu, Jianwei Cai, Jiarong Zhang, Jiawei Song, Jie Jiang, Jie Liu, Jieneng Yang, Jihong Zhang, Jin lv, Jing Zhao, Jinjian Li, Jinxing Liu, Jun Zhao, Juntao Guo, Kai Wang, Kan Wu, Lei Fu, Lei He, Lei Wang, Li Liu, Liang Dong, Liya Zhan, Long Cheng, Long Xu, Mao Zheng, Meng Liu, Mengkang Hu, Nanli Chen, Peirui Chen, Peng He, Pengju Pan, Pengzhi Wei, Qi Yang, Qi Yi, Roberts Wang, Rongpeng Chen, Rui Sun, Rui Yang, Ruibin Chen, Ruixu Zhou, Shaofeng Zhang, Sheng Zhang, Shihao Xu, Shuaishuai Chang, Shulin Liu, SiQi Wang, Songjia Feng, Songling Yuan, Tao Zhang, Tianjiao Lang, Tongkai Li, Wei Deng, Wei Li, Weichao Wang, Weigang Zhang, Weixuan Sun, Wen Ouyang, Wenxiang Jiao, Wenzhi Sun, Wenzhuo Jia, Xiang Zhang, Xiangyu He, Xianshun Ren, XiaoYing Zhu, Xiaolong Guo, Xiaoxue Li, Xiaoyu Ma, Xican Lu, Xinhua Feng, Xinting Huang, Xinyu Guan, Xirui Li, Xu Zhang, Xudong Gao, Xun Luo, Xuxiang Qi, Yangkun Chen, Yangyu Tao, Yanling Xiao, Yantao Mai, Yanze Chen, Yao Ding, Yeting Yang, YiFan Song, Yifan Yang, Yijiao Zhu, Yinhe Wu, Yixian Liu, Yong Yang, Yuanjun Cai, Yuanlin Tu, Yue Zhang, Yufei Huang, Yuhang Zhou, Yuhao Jiang, Yuhong Liu, Yuhui Hu, Yujin Lin, Yun Yang, Yunhao Wang, Yusong Zhang, Zekun Wu, Zelong Zhang, Zhan Yu, Zhaoliang Yang, Zhe Zhao, Zheng Li, Zhenyu Huang, Zhiguang Liu, Zhijiang Xu, Zhiqing Kui, Zhiyin Zeng, Zhiyuan Xiong, Zhuo Han, Zifan Wu, Zigang Geng, Zilong Zhao, Ziyan Tang, Ziyuan Zhu, Zonglei Zhu, Zhijiang Xu
分类: cs.CL
发布日期: 2025-05-21 (更新: 2025-07-04)
💡 一句话要点
Hunyuan-TurboS:通过Mamba-Transformer协同和自适应CoT提升大语言模型性能
🎯 匹配领域: 支柱二:RL算法与架构 (RL & Architecture) 支柱九:具身大模型 (Embodied Foundation Models)
关键词: 大语言模型 Mamba Transformer 混合架构 自适应CoT 长序列建模 混合专家模型 强化学习
📋 核心要点
- 现有大语言模型在处理长序列时面临计算效率和上下文理解的挑战,难以兼顾。
- Hunyuan-TurboS通过结合Mamba和Transformer的优势,并引入自适应CoT机制,实现了高效且强大的语言模型。
- 实验结果表明,Hunyuan-TurboS在多个基准测试中表现出色,并在LMSYS Chatbot Arena中取得了领先地位。
📝 摘要(中文)
本文介绍了Hunyuan-TurboS,一种新型的混合Transformer-Mamba混合专家(MoE)大语言模型。它协同结合了Mamba的长序列处理效率和Transformer卓越的上下文理解能力。Hunyuan-TurboS采用自适应长短链式思考(CoT)机制,在简单查询的快速响应和复杂问题的深度“思考”模式之间动态切换,优化计算资源。该模型具有56B激活参数(总计560B参数),采用128层(Mamba2、Attention、FFN)和创新的AMF/MF块模式。更快的Mamba2确保线性复杂度,分组查询注意力最小化KV缓存,FFN使用MoE结构。该模型在16T高质量tokens上进行预训练,支持256K上下文长度,是业界首个大规模部署的Mamba模型。通过监督微调(3M指令)、自适应长短CoT融合方法、用于迭代改进的多轮审议学习以及针对STEM和通用指令遵循的两阶段大规模强化学习,全面提升模型能力。评估表明,该模型性能强劲:在LMSYS Chatbot Arena中排名第7,得分为1356,优于Gemini-2.0-Flash-001 (1352)和o4-mini-2025-04-16 (1345)等领先模型。TurboS在23个自动化基准测试中平均达到77.9%。Hunyuan-TurboS平衡了高性能和效率,以低于许多推理模型的成本提供强大的功能,为高效的大规模预训练模型建立了一个新的范例。
🔬 方法详解
问题定义:现有的大语言模型在处理长文本时,Transformer架构的计算复杂度呈平方增长,导致效率低下。同时,如何在不同复杂度的任务中合理分配计算资源,避免过度推理或欠推理,也是一个挑战。
核心思路:Hunyuan-TurboS的核心思路是结合Mamba和Transformer的优势,利用Mamba的高效长序列处理能力和Transformer的强大上下文理解能力。同时,引入自适应CoT机制,根据任务的复杂度动态调整推理深度,从而在性能和效率之间取得平衡。
技术框架:Hunyuan-TurboS采用混合Transformer-Mamba MoE架构,包含128层,使用AMF/MF块模式。其中,Mamba2负责高效的长序列建模,Attention模块增强上下文理解,FFN采用MoE结构提升模型容量。模型整体训练流程包括预训练、监督微调、自适应CoT融合、多轮审议学习和强化学习等阶段。
关键创新:该论文的关键创新在于混合架构的设计,将Mamba和Transformer有机结合,充分发挥各自的优势。此外,自适应CoT机制能够根据任务难度动态调整推理深度,有效提升了模型的效率和性能。
关键设计:模型采用56B激活参数(总计560B参数),支持256K上下文长度。Mamba2模块采用线性复杂度设计,Grouped-Query Attention最小化KV缓存。自适应CoT融合方法和多轮审议学习进一步提升了模型的推理能力。两阶段强化学习针对STEM和通用指令遵循进行优化。
🖼️ 关键图片
📊 实验亮点
Hunyuan-TurboS在LMSYS Chatbot Arena中排名第7,得分为1356,优于Gemini-2.0-Flash-001 (1352)和o4-mini-2025-04-16 (1345)等模型。在23个自动化基准测试中,Hunyuan-TurboS的平均性能达到77.9%。这些结果表明,Hunyuan-TurboS在性能和效率方面都取得了显著的提升。
🎯 应用场景
Hunyuan-TurboS具有广泛的应用前景,包括智能客服、文本生成、代码生成、知识问答等。其高效的推理能力和强大的上下文理解能力使其能够胜任各种复杂的自然语言处理任务,并有望在实际应用中降低推理成本,提升用户体验。
📄 摘要(原文)
As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep "thinking" modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern. Faster Mamba2 ensures linear complexity, Grouped-Query Attention minimizes KV cache, and FFNs use an MoE structure. Pre-trained on 16T high-quality tokens, it supports a 256K context length and is the first industry-deployed large-scale Mamba model. Our comprehensive post-training strategy enhances capabilities via Supervised Fine-Tuning (3M instructions), a novel Adaptive Long-short CoT Fusion method, Multi-round Deliberation Learning for iterative improvement, and a two-stage Large-scale Reinforcement Learning process targeting STEM and general instruction-following. Evaluations show strong performance: overall top 7 rank on LMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345). TurboS also achieves an average of 77.9% across 23 automated benchmarks. Hunyuan-TurboS balances high performance and efficiency, offering substantial capabilities at lower inference costs than many reasoning models, establishing a new paradigm for efficient large-scale pre-trained models.