cs.LG（2026-02-04）

📊 共 42 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (20 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (19 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (20 篇)

#	题目	一句话要点	标签	🔗
1	Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning	提出STREAM-RL框架，通过不确定性感知的共形预测和世界模型强化学习实现安全城市交通控制。	reinforcement learning policy learning PPO
2	Training A Foundation Model to Represent Graphs as Vectors	提出一种图向量表征的图基础模型训练方法，用于图分类和图聚类等图级别任务。	contrastive learning foundation model
3	Rethinking the Trust Region in LLM Reinforcement Learning	提出DPPO算法，通过直接估计策略散度，提升LLM强化学习的稳定性和效率。	reinforcement learning PPO large language model
4	Thickening-to-Thinning: Reward Shaping via Human-Inspired Learning Dynamics for LLM Reasoning	提出T2T动态奖励框架，通过模拟人类学习动态提升LLM推理能力	reinforcement learning reward shaping large language model
5	EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL	EMA-PG：通过EMA锚定和Top-k KL提升LLM强化学习的稳定性和性能	reinforcement learning large language model	✅
6	REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency	REDistill：一种鲁棒的估计器蒸馏方法，平衡鲁棒性和效率	teacher-student distillation
7	Beyond Rewards in Reinforcement Learning for Cyber Defence	提出稀疏奖励机制以优化网络防御中的强化学习	reinforcement learning deep reinforcement learning
8	SAFE: Stable Alignment Finetuning with Entropy-Aware Predictive Control for RLHF	SAFE：通过熵感知预测控制实现RLHF的稳定对齐微调	PPO RLHF	✅
9	Stochastic Decision Horizons for Constrained Reinforcement Learning	提出基于随机决策范围的约束强化学习方法，提升样本效率和回报-违例权衡。	reinforcement learning SAC
10	Topology-Aware Revival for Efficient Sparse Training	提出拓扑感知复苏(TAR)方法，提升静态稀疏训练在强化学习中的性能。	reinforcement learning deep reinforcement learning SAC
11	Contrastive Continual Learning for Model Adaptability in Internet of Things	提出对比持续学习以解决物联网模型适应性问题	representation learning contrastive learning distillation
12	CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation	提出CRoSS：一个可扩展的、高任务多样性和真实物理仿真的持续机器人学习平台。	reinforcement learning
13	The Key to State Reduction in Linear Attention: A Rank-based Perspective	提出基于秩的线性注意力状态压缩方法，提升效率并降低内存占用。	linear attention	✅
14	Rationality Measurement and Theory for Reinforcement Learning Agents	提出理性测量与理论以优化强化学习代理的决策	reinforcement learning	✅
15	DMFlow: Disordered Materials Generation by Flow Matching	提出DMFlow，通过流匹配生成无序材料，填补了深度生成模型在无序晶体生成方面的空白。	flow matching
16	Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels	提出基于知识蒸馏的毫米波波束预测方法，利用Sub-6 GHz信道信息，降低计算复杂度。	distillation
17	MirrorLA: Reflecting Feature Map for Vision Linear Attention	MirrorLA通过反射特征图解决线性注意力性能下降问题，提升表征能力。	linear attention
18	Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting	提出支持广义折扣的风险敏感强化学习框架，解耦时间偏好与风险评估	reinforcement learning
19	Evolving Afferent Architectures: Biologically-inspired Models for Damage-Avoidance Learning	提出基于进化仿生模型的传入学习框架，用于损伤规避学习。	reinforcement learning policy learning
20	From Ambiguity to Action: A POMDP Perspective on Partial Multi-Label Ambiguity and Its Horizon-One Resolution	提出基于POMDP的部分多标签学习框架，解决标签歧义并优化特征选择。	reinforcement learning transformer policy

🔬 支柱九：具身大模型 (Embodied Foundation Models) (19 篇)

#	题目	一句话要点	标签	🔗
21	Toward Effective Multimodal Graph Foundation Model: A Divide-and-Conquer Based Approach	提出PLANET，一种基于分治策略的多模态图神经网络，用于解决模态交互和对齐问题。	foundation model multimodal
22	Dynamical Regimes of Multimodal Diffusion Models	提出耦合扩散模型的理论框架，揭示多模态生成的动态机制与时间尺度	multimodal
23	Billion-Scale Graph Foundation Models	提出GraphBFF框架，用于构建十亿级参数的图神经网络基础模型，实现通用图数据的零样本学习。	foundation model
24	Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty	提出A2MAML，解决多智能体异构多模态感知中的不确定性问题，提升事故检测率。	multimodal
25	Multi-scale hypergraph meets LLMs: Aligning large language models for time series analysis	MSH-LLM：多尺度超图对齐大语言模型用于时间序列分析	large language model
26	BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models	BPDQ：基于可变网格的比特平面分解量化，用于大语言模型压缩	large language model
27	Training Data Efficiency in Multimodal Process Reward Models	提出平衡信息评分(BIS)方法，提升多模态过程奖励模型训练的数据效率。	multimodal
28	Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism	提出Multi-Head LatentMoE与Head Parallel，实现通信高效且确定的MoE并行训练。	large language model foundation model
29	EXaMCaP: Subset Selection with Entropy Gain Maximization for Probing Capability Gains of Large Chart Understanding Training Sets	提出EXaMCaP，通过熵增益最大化进行子集选择，高效评估图表理解训练集的能力增益。	large language model multimodal
30	Subliminal Effects in Your Data: A General Mechanism via Log-Linearity	提出Logit-Linear-Selection方法，揭示数据集中的隐蔽影响，实现模型行为操控。	large language model
31	Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data Generation	提出Team-then-Trim框架，利用LLM流水线生成高质量表格数据，解决数据稀缺问题。	large language model
32	Decomposing Query-Key Feature Interactions Using Contrastive Covariances	提出对比协方差方法，分解Transformer的Query-Key交互空间，提升模型可解释性。	large language model
33	Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates	提出条件反事实均值嵌入(CCME)框架，用于异质性处理效应的完整条件分布刻画。	multimodal
34	From Data to Behavior: Predicting Unintended Model Behaviors Before Training	提出Data2Behavior任务与MDF方法，用于训练前预测LLM的潜在偏差与风险。	large language model
35	LoRDO: Distributed Low-Rank Optimization with Infrequent Communication	LoRDO：一种低通信频率的分布式低秩优化方法，用于解决大模型训练中的带宽瓶颈。	foundation model
36	On the use of LLMs to generate a dataset of Neural Networks	利用大型语言模型生成多样化神经网络数据集，促进可靠性和适应性研究	large language model
37	UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching	提出UnMaskFork，通过确定性动作分支实现Masked Diffusion模型测试时性能提升	large language model
38	Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration	提出INFORM，解耦多专家协同中的因果重要性与涌现结构	large language model
39	RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning	提出RAPO框架，提升大型推理模型在复杂攻击下的安全推理泛化能力	chain-of-thought	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
40	From Sparse Sensors to Continuous Fields: STRIDE for Spatiotemporal Reconstruction	STRIDE：基于稀疏传感器数据时空重建的隐式神经表示方法	sparse sensors spatiotemporal

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
41	Pruning for Generalization: A Transfer-Oriented Spatiotemporal Graph Framework	提出TL-GPSTGN，通过剪枝优化图结构时空预测，提升小样本和跨域泛化能力	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
42	Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits	针对线性与平滑上下文Bandit，提出抗攻击的均匀公平算法，提升系统鲁棒性。	manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-02-04）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (20 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (19 篇)

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理