cs.LG（2026-02-05）

📊 共 48 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (25) 支柱九：具身大模型 (Embodied Foundation Models) (21 🔗5) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (25 篇)

#	题目	一句话要点	标签
1	HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction	HealthMamba：不确定性感知的时空图状态空间模型，用于有效可靠的医疗设施访问预测	Mamba state space model spatiotemporal
2	Constrained Group Relative Policy Optimization	提出Constrained GRPO，解决带约束的免Critic策略优化问题，提升机器人任务性能。	policy learning embodied AI foundation model
3	Path-Guided Flow Matching for Dataset Distillation	提出路径引导的Flow Matching，用于高效数据集蒸馏，提升下游泛化能力。	flow matching distillation
4	Disentangled Representation Learning via Flow Matching	提出基于Flow Matching的解耦表示学习框架，提升语义对齐和解耦性能。	flow matching representation learning
5	Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities	提出ARM机制，通过生成概率改进LLM推理中的强化学习探索，提升多样性。	reinforcement learning large language model
6	Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement Learning	提出Meta-Autointerp方法，用于LLM多智能体强化学习中数据中心的可解释性分析。	reinforcement learning large language model
7	On Computation and Reinforcement Learning	提出计算量受限策略框架，提升强化学习策略的性能和泛化能力	reinforcement learning offline RL
8	Distributional Reinforcement Learning with Diffusion Bridge Critics	提出基于扩散桥Critic的分布强化学习方法DBC，提升连续控制任务性能。	reinforcement learning diffusion policy
9	Rewards as Labels: Revisiting RLVR from a Classification Perspective	提出REAL框架，将可验证奖励视为标签，解决强化学习中梯度误分配和梯度主导问题。	reinforcement learning policy learning large language model
10	Mode-Dependent Rectification for Stable PPO Training	提出Mode-Dependent Rectification，稳定PPO在视觉强化学习中的训练	reinforcement learning PPO
11	$f$-GRPO and Beyond: Divergence-Based Reinforcement Learning Algorithms for General LLM Alignment	提出基于f散度的LLM对齐算法，提升通用对齐任务性能	reinforcement learning
12	Verification of the Implicit World Model in a Generative Model via Adversarial Sequences	提出对抗序列生成方法，用于验证生成模型在国际象棋领域的隐式世界模型。	world model
13	Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations	提出Dr. Kernel，利用强化学习优化Triton内核生成，性能超越现有LLM。	reinforcement learning
14	Cross-Domain Offline Policy Adaptation via Selective Transition Correction	提出选择性转移修正(STC)算法，解决跨领域离线策略迁移中的动态不匹配问题。	reinforcement learning policy learning offline RL
15	Learning to Inject: Automated Prompt Injection via Reinforcement Learning	提出AutoInject，利用强化学习自动生成Prompt注入攻击，提升攻击成功率和迁移性。	reinforcement learning
16	CSRv2: Unlocking Ultra-Sparse Embeddings	CSRv2：解锁超稀疏嵌入，实现高效且高性能的文本和视觉表示	representation learning foundation model
17	Steering Large Reasoning Models towards Concise Reasoning via Flow Matching	FlowSteer：通过流匹配引导大模型生成更简洁的推理过程	flow matching
18	A Unified Framework for Rethinking Policy Divergence Measures in GRPO	提出统一剪切框架以优化GRPO中的策略发散度度量	reinforcement learning large language model
19	When Are RL Hyperparameters Benign? A Study in Offline Goal-Conditioned RL	离线目标条件RL中，超参数敏感性并非必然，引导目标函数设计	reinforcement learning deep reinforcement learning representation learning
20	A Decomposition-based State Space Model for Multivariate Time-Series Forecasting	DecompSSM：一种基于分解的状态空间模型用于多元时间序列预测	state space model
21	Accelerated Sequential Flow Matching: A Bayesian Filtering Perspective	提出基于贝叶斯滤波的加速序列流匹配方法，提升实时序列预测效率	flow matching
22	ZeroS: Zero-Sum Linear Attention for Efficient Transformers	提出ZeroS：零和线性注意力机制，提升Transformer效率与性能	linear attention
23	Formal Synthesis of Certifiably Robust Neural Lyapunov-Barrier Certificates	提出鲁棒神经Lyapunov-障碍证书以解决动态不确定性问题	reinforcement learning deep reinforcement learning
24	DFPO: Scaling Value Modeling via Distributional Flow towards Robust and Generalizable LLM Post-Training	DFPO：通过分布流建模扩展价值函数，实现LLM后训练的鲁棒性和泛化性	reinforcement learning PPO
25	Variance Reduction Based Experience Replay for Policy Optimization	提出基于方差缩减的经验回放方法，提升强化学习策略优化效率	reinforcement learning policy learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (21 篇)

#	题目	一句话要点	标签	🔗
26	Empowering Time Series Analysis with Large-Scale Multimodal Pretraining	提出HORAI：一种基于大规模多模态预训练的时间序列分析框架，提升零样本泛化能力。	foundation model multimodal
27	Alignment Verifiability in Large Language Models: Normative Indistinguishability under Behavioral Evaluation	研究表明有限行为评估无法唯一验证大语言模型的潜在对齐	large language model
28	End-to-End Compression for Tabular Foundation Models	提出TACO，一种端到端表格数据压缩模型，加速表格Foundation Model推理。	foundation model
29	OpenMAG: A Comprehensive Benchmark for Multimodal-Attributed Graph	OpenMAG：一个用于多模态属性图学习的综合性评测基准。	multimodal	✅
30	Assessing Electricity Demand Forecasting with Exogenous Data in Time Series Foundation Models	评估时间序列基础模型中外生数据对电力需求预测的影响	foundation model
31	TADS: Task-Aware Data Selection for Multi-Task Multimodal Pre-Training	提出TADS，用于多任务多模态预训练的任务感知数据选择。	multimodal
32	PhysicsAgentABM: Physics-Guided Generative Agent-Based Modeling	PhysicsAgentABM：基于物理先验的生成式Agent建模，提升可扩展性和校准性	large language model multimodal
33	Layer-wise LoRA fine-tuning: a similarity metric approach	提出层级LoRA微调方法，通过相似性度量选择关键层，降低计算成本。	large language model multimodal	✅
34	Correctness-Optimized Residual Activation Lens (CORAL): Transferrable and Calibration-Aware Inference-Time Steering	提出CORAL，通过优化残差激活透镜提升LLM推理时校准性和准确率。	large language model
35	Diffusion Model's Generalization Can Be Characterized by Inductive Biases toward a Data-Dependent Ridge Manifold	提出基于数据依赖脊流形的扩散模型泛化能力表征方法	multimodal
36	Inverse Depth Scaling From Most Layers Being Similar	发现LLM深度与损失反比关系，源于相似层集成平均而非组合学习	large language model
37	Orthogonal Model Merging	提出正交模型合并(OrthoMerge)，在黎曼流形上融合LLM以保留几何结构。	large language model
38	Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences	揭示Transformer初始化偏差：结构性归纳偏置及其影响	large language model
39	Regularized Calibration with Successive Rounding for Post-Training Quantization	提出基于正则化校准和逐次舍入的后训练量化方法，提升大语言模型性能。	large language model
40	DLM-Scope: Mechanistic Interpretability of Diffusion Language Models via Sparse Autoencoders	DLM-Scope：首个基于稀疏自编码器的扩散语言模型可解释性框架	large language model
41	Where Does Warm-Up Come From? Adaptive Scheduling for Norm-Constrained Optimizers	针对Norm约束优化器，提出自适应warm-up调度方法，提升LLM预训练效果	large language model	✅
42	Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification	提出Evidential Uncertainty Quantification (EUQ)方法，用于检测大视觉语言模型的不良行为。	multimodal	✅
43	BLITZRANK: Principled Zero-shot Ranking Agents with Tournament Graphs	提出基于锦标赛图的BLITZRANK零样本排序代理，提升排序效率和准确性。	large language model
44	Hybrid Gated Flow (HGF): Stabilizing 1.58-bit LLMs via Selective Low-Rank Correction	提出混合门控流（HGF），通过选择性低秩校正稳定1.58位大语言模型。	large language model
45	Faithful Bi-Directional Model Steering via Distribution Matching and Distributed Interchange Interventions	提出Concept DAS，通过分布匹配和分布式交换干预实现可信的双向模型引导	chain-of-thought	✅
46	Double-P: Hierarchical Top-P Sparse Attention for Long-Context LLMs	提出Double-P分层Top-P稀疏注意力，加速长文本LLM推理。	large language model

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
47	Robust Federated Learning via Byzantine Filtering over Encrypted Updates	提出基于同态加密和Byzantine过滤的联邦学习方法，增强隐私性和鲁棒性	OMOMO

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
48	Extreme Weather Nowcasting via Local Precipitation Pattern Prediction	exPreCast：基于局部降水模式预测的极端天气临近预报框架	spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-02-05）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (25 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (21 篇)

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理