cs.LG(2026-02-05)

📊 共 48 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (25) 支柱九:具身大模型 (Embodied Foundation Models) (21 🔗5) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (25 篇)

#题目一句话要点标签🔗
1 HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction HealthMamba:不确定性感知的时空图状态空间模型,用于有效可靠的医疗设施访问预测 Mamba state space model spatiotemporal
2 Constrained Group Relative Policy Optimization 提出Constrained GRPO,解决带约束的免Critic策略优化问题,提升机器人任务性能。 policy learning embodied AI foundation model
3 Path-Guided Flow Matching for Dataset Distillation 提出路径引导的Flow Matching,用于高效数据集蒸馏,提升下游泛化能力。 flow matching distillation
4 Disentangled Representation Learning via Flow Matching 提出基于Flow Matching的解耦表示学习框架,提升语义对齐和解耦性能。 flow matching representation learning
5 Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities 提出ARM机制,通过生成概率改进LLM推理中的强化学习探索,提升多样性。 reinforcement learning large language model
6 Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement Learning 提出Meta-Autointerp方法,用于LLM多智能体强化学习中数据中心的可解释性分析。 reinforcement learning large language model
7 On Computation and Reinforcement Learning 提出计算量受限策略框架,提升强化学习策略的性能和泛化能力 reinforcement learning offline RL
8 Distributional Reinforcement Learning with Diffusion Bridge Critics 提出基于扩散桥Critic的分布强化学习方法DBC,提升连续控制任务性能。 reinforcement learning diffusion policy
9 Rewards as Labels: Revisiting RLVR from a Classification Perspective 提出REAL框架,将可验证奖励视为标签,解决强化学习中梯度误分配和梯度主导问题。 reinforcement learning policy learning large language model
10 Mode-Dependent Rectification for Stable PPO Training 提出Mode-Dependent Rectification,稳定PPO在视觉强化学习中的训练 reinforcement learning PPO
11 $f$-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment 提出基于f散度的LLM对齐算法,提升通用对齐任务性能 reinforcement learning
12 Verification of the Implicit World Model in a Generative Model via Adversarial Sequences 提出对抗序列生成方法,用于验证生成模型在国际象棋领域的隐式世界模型。 world model
13 Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations 提出Dr. Kernel,利用强化学习优化Triton内核生成,性能超越现有LLM。 reinforcement learning
14 Cross-Domain Offline Policy Adaptation via Selective Transition Correction 提出选择性转移修正(STC)算法,解决跨领域离线策略迁移中的动态不匹配问题。 reinforcement learning policy learning offline RL
15 Learning to Inject: Automated Prompt Injection via Reinforcement Learning 提出AutoInject,利用强化学习自动生成Prompt注入攻击,提升攻击成功率和迁移性。 reinforcement learning
16 CSRv2: Unlocking Ultra-Sparse Embeddings CSRv2:解锁超稀疏嵌入,实现高效且高性能的文本和视觉表示 representation learning foundation model
17 Steering Large Reasoning Models towards Concise Reasoning via Flow Matching FlowSteer:通过流匹配引导大模型生成更简洁的推理过程 flow matching
18 A Unified Framework for Rethinking Policy Divergence Measures in GRPO 提出统一剪切框架以优化GRPO中的策略发散度度量 reinforcement learning large language model
19 When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL 离线目标条件RL中,超参数敏感性并非必然,引导目标函数设计 reinforcement learning deep reinforcement learning representation learning
20 A Decomposition-based State Space Model for Multivariate Time-Series Forecasting DecompSSM:一种基于分解的状态空间模型用于多元时间序列预测 state space model
21 Accelerated Sequential Flow Matching: A Bayesian Filtering Perspective 提出基于贝叶斯滤波的加速序列流匹配方法,提升实时序列预测效率 flow matching
22 ZeroS: Zero-Sum Linear Attention for Efficient Transformers 提出ZeroS:零和线性注意力机制,提升Transformer效率与性能 linear attention
23 Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier Certificates 提出鲁棒神经Lyapunov-障碍证书以解决动态不确定性问题 reinforcement learning deep reinforcement learning
24 DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training DFPO:通过分布流建模扩展价值函数,实现LLM后训练的鲁棒性和泛化性 reinforcement learning PPO
25 Variance Reduction Based Experience Replay for Policy Optimization 提出基于方差缩减的经验回放方法,提升强化学习策略优化效率 reinforcement learning policy learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (21 篇)

#题目一句话要点标签🔗
26 Empowering Time Series Analysis with Large-Scale Multimodal Pretraining 提出HORAI:一种基于大规模多模态预训练的时间序列分析框架,提升零样本泛化能力。 foundation model multimodal
27 Alignment Verifiability in Large Language Models: Normative Indistinguishability under Behavioral Evaluation 研究表明有限行为评估无法唯一验证大语言模型的潜在对齐 large language model
28 End-to-End Compression for Tabular Foundation Models 提出TACO,一种端到端表格数据压缩模型,加速表格Foundation Model推理。 foundation model
29 OpenMAG: A Comprehensive Benchmark for Multimodal-Attributed Graph OpenMAG:一个用于多模态属性图学习的综合性评测基准。 multimodal
30 Assessing Electricity Demand Forecasting with Exogenous Data in Time Series Foundation Models 评估时间序列基础模型中外生数据对电力需求预测的影响 foundation model
31 TADS: Task-Aware Data Selection for Multi-Task Multimodal Pre-Training 提出TADS,用于多任务多模态预训练的任务感知数据选择。 multimodal
32 PhysicsAgentABM: Physics-Guided Generative Agent-Based Modeling PhysicsAgentABM:基于物理先验的生成式Agent建模,提升可扩展性和校准性 large language model multimodal
33 Layer-wise LoRA fine-tuning: a similarity metric approach 提出层级LoRA微调方法,通过相似性度量选择关键层,降低计算成本。 large language model multimodal
34 Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering 提出CORAL,通过优化残差激活透镜提升LLM推理时校准性和准确率。 large language model
35 Diffusion Model's Generalization Can Be Characterized by Inductive Biases toward a Data-Dependent Ridge Manifold 提出基于数据依赖脊流形的扩散模型泛化能力表征方法 multimodal
36 Inverse Depth Scaling From Most Layers Being Similar 发现LLM深度与损失反比关系,源于相似层集成平均而非组合学习 large language model
37 Orthogonal Model Merging 提出正交模型合并(OrthoMerge),在黎曼流形上融合LLM以保留几何结构。 large language model
38 Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences 揭示Transformer初始化偏差:结构性归纳偏置及其影响 large language model
39 Regularized Calibration with Successive Rounding for Post-Training Quantization 提出基于正则化校准和逐次舍入的后训练量化方法,提升大语言模型性能。 large language model
40 DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders DLM-Scope:首个基于稀疏自编码器的扩散语言模型可解释性框架 large language model
41 Where Does Warm-Up Come From? Adaptive Scheduling for Norm-Constrained Optimizers 针对Norm约束优化器,提出自适应warm-up调度方法,提升LLM预训练效果 large language model
42 Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification 提出Evidential Uncertainty Quantification (EUQ)方法,用于检测大视觉语言模型的不良行为。 multimodal
43 BLITZRANK: Principled Zero-shot Ranking Agents with Tournament Graphs 提出基于锦标赛图的BLITZRANK零样本排序代理,提升排序效率和准确性。 large language model
44 Hybrid Gated Flow (HGF): Stabilizing 1.58-bit LLMs via Selective Low-Rank Correction 提出混合门控流(HGF),通过选择性低秩校正稳定1.58位大语言模型。 large language model
45 Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions 提出Concept DAS,通过分布匹配和分布式交换干预实现可信的双向模型引导 chain-of-thought
46 Double-P: Hierarchical Top-P Sparse Attention for Long-Context LLMs 提出Double-P分层Top-P稀疏注意力,加速长文本LLM推理。 large language model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
47 Robust Federated Learning via Byzantine Filtering over Encrypted Updates 提出基于同态加密和Byzantine过滤的联邦学习方法,增强隐私性和鲁棒性 OMOMO

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
48 Extreme Weather Nowcasting via Local Precipitation Pattern Prediction exPreCast:基于局部降水模式预测的极端天气临近预报框架 spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页