TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models

📄 arXiv: 2606.06902v1 📥 PDF

作者: Chengkai Zhang, Ziteng Liu, Junpu Wang, Zeyi Tao, Yang Wang, Sagar Chordia, Qin Huang

分类: cs.LG

发布日期: 2026-06-05


💡 一句话要点

提出TALAN以解决大语言模型后训练的针对性适应问题

🎯 匹配领域: 支柱九:具身大模型 (Embodied Foundation Models)

关键词: 大语言模型 后训练 任务适应 低秩适配器 激活干预 性能提升 自然语言处理 代码生成

📋 核心要点

  1. 现有的后训练方法在提升特定任务能力时,往往会影响模型的其他优势,缺乏有效的任务对齐机制。
  2. TALAN通过在变换器的残差流中引入潜在侧路径,结合低秩适配器进行共同训练,实现了高效的任务适应。
  3. 在多个基准测试中,TALAN相较于LoRA和DoRA基线表现出显著的性能提升,且训练参数和推理开销均较小。

📝 摘要(中文)

针对性后训练旨在提升推理、数学和代码能力,同时不降低模型的其他优势。现有的低秩适配器虽然高效,但缺乏任务特异性;而激活干预方法虽然能够根据输入进行调整,但通常需要额外的探针或向量。本文提出了TALAN(任务对齐潜在适应网络),通过在变换器的残差流中插入序列条件的潜在侧路径,并与低秩适配器共同训练,从而实现了有效的任务适应。实验结果表明,TALAN在多个基准测试中显著提升了模型性能,且训练成本较低。

🔬 方法详解

问题定义:本文旨在解决大语言模型在后训练阶段针对性适应的问题,现有方法如低秩适配器缺乏任务特异性,激活干预方法则需要额外的探针,导致复杂性增加。

核心思路:TALAN通过在变换器的残差流中插入序列条件的潜在侧路径,利用低秩适配器进行共同训练,从而实现对模型的有效调整,提升特定任务的性能。

技术框架:TALAN的整体架构包括六个配置轴:插入位置、内存大小、混合器、写回规则、可训练范围和梯度缩放。通过这些模块的协同作用,TALAN能够有效地压缩活跃序列并进行扰动。

关键创新:TALAN的主要创新在于其小规模的激活干预方法,与传统的适配器更新相比,TALAN的扰动规模显著较小,但能够有效传播并放大其影响。

关键设计:TALAN的设计包括对潜在内存的配置、扰动的混合方式以及写回规则的设定,确保了在低训练成本下实现高效的任务适应。

🖼️ 关键图片

fig_0
fig_1
fig_2

📊 实验亮点

实验结果显示,TALAN在与LoRA和DoRA的对比中,分别实现了+1.41和+1.85的平均性能提升,且在所有测试模型和基准上均表现出积极效果,验证了其有效性和鲁棒性。

🎯 应用场景

TALAN的研究成果在自然语言处理、代码生成和数学推理等领域具有广泛的应用潜力。通过提升模型在特定任务上的表现,TALAN能够为实际应用提供更强的支持,推动智能助手、编程辅助工具等技术的发展。

📄 摘要(原文)

Targeted post-training aims to improve reasoning, math, and code without degrading strengths. Low-rank adapters are efficient but task-global; activation interventions are input-aware but often require separate probes, vectors, or inference-time steering. We introduce TALAN (Task-Aligned Latent Adaptation Networks), a sequence-conditioned latent side path inserted into a transformer's residual stream and co-trained with a low-rank adapter in one SFT loop. TALAN compresses the active sequence into latent memory, remixes it into token-level perturbations, and writes them back through a controlled residual update. It is configured along six axes: insertion location, memory size, mixer, writeback rule, trainability scope, and gradient scale. Across four Qwen3-family backbones and four STEM/code benchmarks, TALAN improves matched LoRA and DoRA baselines. With LoRA, it yields a +1.41 point cross-model mean gain, positive on all four backbones and non-negative on all 16 model-benchmark cells. With DoRA, it yields a +1.85 point mean gain, positive on all backbones and on 13 of 16 cells. Paired seed checks support positive average effects but show nontrivial variance, so we treat them as sensitivity checks. Cost is small: <1% trainable parameters relative to the backbone and 1.01-1.02x inference overhead versus matched LoRA. A Llama-3.2-1B transfer probe is also positive under LoRA and rsLoRA across seven paired seeds, supporting a transfer beyond Qwen. Internal-state analyses suggest TALAN is a small complementary activation intervention. The matched adapter update is 80-1,700x larger than the TALAN perturbation, yet their directions have near-zero cosine; per-layer measurements show this small orthogonal perturbation propagates and amplifies through depth. TALAN offers a practical platform for studying steerable activation-level adaptation within standard adapter-based post-training.