cs.LG(2026-02-02)

📊 共 68 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (36 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (25 🔗5) 支柱一:机器人控制 (Robot Control) (3) 支柱八:物理动画 (Physics-based Animation) (2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (36 篇)

#题目一句话要点标签🔗
1 IRIS: Implicit Reward-Guided Internal Sifting for Mitigating Multimodal Hallucination IRIS:利用隐式奖励引导内部筛选,缓解多模态大语言模型的幻觉问题 DPO direct preference optimization large language model
2 Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation 提出基于知识蒸馏的高效LLM不确定性估计方法,降低幻觉并提升安全性。 distillation large language model
3 T-LLM: Teaching Large Language Models to Forecast Time Series via Temporal Distillation T-LLM:通过时序蒸馏教导大语言模型进行时间序列预测 distillation large language model
4 A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning 提出相对预算理论以优化大语言模型的强化学习效果 reinforcement learning large language model
5 Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment 提出语义感知的Wasserstein策略正则化方法,提升大语言模型对齐效果 reinforcement learning RLHF large language model
6 From Perception to Action: Spatial AI Agents and World Models 构建空间智能Agent:提出统一框架连接Agent能力与空间任务,解决物理世界感知与行动难题。 world model large language model symbolic grounding
7 FORLER: Federated Offline Reinforcement Learning with Q-Ensemble and Actor Rectification 提出FORLER,解决低质量异构数据下的联邦离线强化学习策略污染问题 reinforcement learning offline RL offline reinforcement learning
8 SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization SLIME:稳定似然隐式边际强化,用于偏好优化,解决LLM对齐中的“遗忘”和“格式崩溃”问题。 reinforcement learning preference learning RLHF
9 DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations DCoPilot:利用生成式AI进行动态数据中心运营的策略自适应 reinforcement learning deep reinforcement learning DRL
10 David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning 提出Slingshot框架,通过强化学习实现对Agent的零样本越狱攻击。 reinforcement learning large language model
11 State Rank Dynamics in Linear Attention LLMs 揭示线性注意力LLM状态秩动态特性,提出联合秩-范数剪枝优化KV缓存。 linear attention large language model
12 ECHO-2: A Large Scale Distributed Rollout Framework for Cost-efficient Reinforcement Learning ECHO-2:一种大规模分布式Rollout框架,用于高性价比的强化学习 reinforcement learning large language model
13 VLM-Guided Experience Replay 利用VLM引导经验回放,提升强化学习样本效率与性能 reinforcement learning large language model multimodal
14 Beyond Mode Elicitation: Diversity-Preserving Reinforcement Learning via Latent Diffusion Reasoner 提出LaDi-RL,通过潜在扩散推理增强强化学习,解决LLM推理中多样性崩溃问题 reinforcement learning chain-of-thought
15 ASGMamba: Adaptive Spectral Gating Mamba for Multivariate Time Series Forecasting 提出ASGMamba,通过自适应频谱门控Mamba实现高效多元时间序列预测 Mamba SSM state space model
16 Expanding the Capabilities of Reinforcement Learning via Text Feedback 通过文本反馈扩展强化学习能力以解决信息稀缺问题 reinforcement learning distillation
17 Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models CurioSFT:通过自适应自蒸馏保持熵的监督微调,提升大型推理模型的探索能力 reinforcement learning distillation
18 DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics DIA-CLIP:用于零样本DIA蛋白质组学的通用表征学习框架 representation learning contrastive learning
19 A Provable Expressiveness Hierarchy in Hybrid Linear-Full Attention 证明混合线性-全注意力机制表达能力存在层级差异 Mamba linear attention large language model
20 Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards 提出VIP算法,通过自适应rollout分配提升可验证奖励在线强化学习的采样效率 reinforcement learning VIP
21 Segment to Focus: Guiding Latent Action Models in the Presence of Distractors MaskLAM:通过视觉分割引导潜在动作模型,解决背景干扰问题 reinforcement learning foundation model
22 An Empirical Study of World Model Quantization 针对DINO-WM世界模型,系统性研究了后训练量化对视觉规划任务的影响。 world model
23 Generative Visual Code Mobile World Models 提出基于可渲染代码生成的移动GUI世界模型gWorld,提升移动GUI代理性能。 world model
24 Active Causal Experimentalist (ACE): Learning Intervention Strategies via Direct Preference Optimization 提出ACE,通过直接偏好优化学习因果干预策略,提升实验设计效率。 direct preference optimization
25 Masked Autoencoders as Universal Speech Enhancer 提出基于掩码自编码器的通用语音增强器,实现自监督学习和多场景适应。 masked autoencoder
26 Didactic to Constructive: Turning Expert Solutions into Learnable Reasoning 提出DAIL方法,利用少量专家解提升LLM推理能力并提高效率。 imitation learning large language model
27 Self-Supervised Learning from Structural Invariance 提出AdaSSL,通过结构不变性进行自监督学习,解决一对多映射问题。 world model representation learning distillation
28 STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs 提出STILL框架以高效线性化大型语言模型 linear attention large language model
29 ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning 提出ECHO算法,解决测试时强化学习中rollout崩塌和伪标签偏差问题 reinforcement learning
30 Probabilistic Performance Guarantees for Multi-Task Reinforcement Learning 提出多任务强化学习性能保证方法,为安全关键应用提供高置信度保障。 reinforcement learning
31 Dissecting Outlier Dynamics in LLM NVFP4 Pretraining 针对LLM NVFP4预训练中的异常值问题,提出热通道补偿(HCP)和CHON训练方案。 linear attention large language model
32 Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning 提出基于Transformer强化学习的时间序列A/B测试设计方法,优化策略评估。 reinforcement learning
33 Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It 提出动态学习率调度方法,解决LLM强化学习训练中的训练-推理不匹配问题 reinforcement learning large language model
34 Softmax Linear Attention: Reclaiming Global Competition 提出Softmax线性注意力以解决全局竞争不足问题 linear attention
35 Choice-Model-Assisted Q-learning for Delayed-Feedback Revenue Management 提出选择模型辅助Q学习以解决延迟反馈的收益管理问题 reinforcement learning world model
36 Position: Beyond Model-Centric Prediction -- Agentic Time Series Forecasting 提出Agentic时间序列预测,将模型中心范式转变为智能体驱动的工作流 reinforcement learning predictive model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (25 篇)

#题目一句话要点标签🔗
37 Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models 提出HAE框架,通过分层自适应淘汰策略优化多模态大语言模型KV缓存管理。 large language model multimodal
38 No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs Tele-Lens揭示LLM思维链的短视性,并用于提升CoT不确定性估计 large language model chain-of-thought
39 SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning 提出SAME,稳定多模态混合专家模型,解决持续指令微调中的专家漂移问题。 large language model multimodal
40 EvalQReason: A Framework for Step-Level Reasoning Evaluation in Large Language Models 提出EvalQReason框架以解决大语言模型推理评估问题 large language model
41 Interpretable Tabular Foundation Models via In-Context Kernel Regression 提出KernelICL,通过可解释的核回归增强表格数据的In-Context Learning。 foundation model
42 InfoTok: Regulating Information Flow for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs 提出InfoTok以解决统一多模态大语言模型的信息流调控问题 large language model multimodal
43 When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning 提出Gap-Init方法,通过几何引导初始化稳定极低秩LoRA微调,提升多模态大模型性能。 large language model multimodal
44 FiLoRA: Focus-and-Ignore LoRA for Controllable Feature Reliance FiLoRA:通过指令控制LoRA调节多模态模型内部特征依赖 foundation model multimodal
45 Embedding Perturbation may Better Reflect the Uncertainty in LLM Reasoning 提出基于嵌入扰动的不确定性量化方法,提升LLM推理过程不确定性评估。 large language model
46 An Empirical Study on Noisy Data and LLM Pretraining Loss Divergence 研究噪声数据对LLM预训练损失发散的影响,揭示噪声类型、噪声量和模型规模的关键作用。 large language model
47 ReasonCACHE: Teaching LLMs To Reason Without Weight Updates ReasonCACHE:无需权重更新,教LLM进行推理 large language model
48 An Optimization Method for Autoregressive Time Series Forecasting 提出一种优化自回归时间序列预测的新方法,提升长程预测精度。 large language model
49 Alignment-Aware Model Adaptation via Feedback-Guided Optimization 提出基于反馈引导优化的对齐感知模型自适应方法,提升模型安全性和避免幻觉。 foundation model
50 Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts 提出预测驱动风险监控方法以应对动态环境中的模型性能监测问题 large language model
51 Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents Co-RedTeam:利用LLM Agent协同进行安全漏洞发现与利用 large language model
52 Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization 提出VQRound以提高大语言模型量化效率 large language model
53 Two-Stage Grid Optimization for Group-wise Quantization of LLMs 提出双阶段网格优化算法,提升LLM分组量化精度 large language model
54 AICD Bench: A Challenging Benchmark for AI-Generated Code Detection 提出AICD Bench,一个用于评估AI生成代码检测的全面且具有挑战性的基准。 large language model
55 Hippasus: Effective and Efficient Automatic Feature Augmentation for Machine Learning Tasks on Relational Data Hippasus:一种高效的关联数据机器学习任务自动特征增强框架。 large language model
56 On the Limits of Layer Pruning for Generative Reasoning in LLMs 研究表明LLM层剪枝在生成式推理任务中存在局限性,并提出基于自生成响应的微调策略。 large language model
57 IntraSlice: Towards High-Performance Structural Pruning with Block-Intra PCA for LLMs IntraSlice:面向LLM的高性能结构化剪枝,采用块内PCA压缩 large language model
58 Self-Consolidation for Self-Evolving Agents 提出自演化框架,通过对比反思和自固化机制提升LLM智能体的终身学习能力 large language model
59 COLT: Lightweight Multi-LLM Collaboration through Shared MCTS Reasoning for Model Compilation COLT:通过共享MCTS推理实现轻量级多LLM协作,用于模型编译优化 large language model
60 Internal Flow Signatures for Self-Checking and Refinement in LLMs 提出内部流签名用于LLM的自检与优化,提升生成可靠性。 large language model
61 $\textbf{AGT$^{AO}$}$: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality 提出AGT$^{AO}$框架,通过对抗门控训练和自适应正交性实现LLM的鲁棒且稳定的不可学习。 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
62 Zero-Shot Off-Policy Learning 提出基于后继测度和密度比的零样本离线策略学习算法 humanoid reinforcement learning policy learning
63 Grounding Generated Videos in Feasible Plans via World Models 提出GVP-WM以解决视频生成计划的可行性问题 manipulation trajectory optimization world model
64 Efficient Adversarial Attacks on High-dimensional Offline Bandits 针对高维离线Bandit算法,提出高效对抗攻击方法,揭示其脆弱性。 manipulation large language model

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
65 RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses RIR-Former:坐标引导的Transformer用于连续房间脉冲响应重建 PULSE
66 On the Spatiotemporal Dynamics of Generalization in Neural Networks 提出SEAD架构,通过模拟物理约束实现神经网络的长度泛化能力。 spatiotemporal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
67 Boundary-Constrained Diffusion Models for Floorplan Generation: Balancing Realism and Diversity 提出边界约束扩散模型,平衡平面图生成中的真实性和多样性 geometric consistency

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
68 Unifying Masked Diffusion Models with Various Generation Orders and Beyond 提出可学习生成顺序的掩蔽扩散模型以提升文本生成质量 MDM

⬅️ 返回 cs.LG 首页 · 🏠 返回主页