cs.LG(2026-02-04)

📊 共 42 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (20 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (19 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (20 篇)

#题目一句话要点标签🔗
1 Safe Urban Traffic Control via Uncertainty-Aware Conformal Prediction and World-Model Reinforcement Learning 提出STREAM-RL框架,通过不确定性感知的共形预测和世界模型强化学习实现安全城市交通控制。 reinforcement learning policy learning PPO
2 Training A Foundation Model to Represent Graphs as Vectors 提出一种图向量表征的图基础模型训练方法,用于图分类和图聚类等图级别任务。 contrastive learning foundation model
3 Rethinking the Trust Region in LLM Reinforcement Learning 提出DPPO算法,通过直接估计策略散度,提升LLM强化学习的稳定性和效率。 reinforcement learning PPO large language model
4 Thickening-to-Thinning: Reward Shaping via Human-Inspired Learning Dynamics for LLM Reasoning 提出T2T动态奖励框架,通过模拟人类学习动态提升LLM推理能力 reinforcement learning reward shaping large language model
5 EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL EMA-PG:通过EMA锚定和Top-k KL提升LLM强化学习的稳定性和性能 reinforcement learning large language model
6 REDistill: Robust Estimator Distillation for Balancing Robustness and Efficiency REDistill:一种鲁棒的估计器蒸馏方法,平衡鲁棒性和效率 teacher-student distillation
7 Beyond Rewards in Reinforcement Learning for Cyber Defence 提出稀疏奖励机制以优化网络防御中的强化学习 reinforcement learning deep reinforcement learning
8 SAFE: Stable Alignment Finetuning with Entropy-Aware Predictive Control for RLHF SAFE:通过熵感知预测控制实现RLHF的稳定对齐微调 PPO RLHF
9 Stochastic Decision Horizons for Constrained Reinforcement Learning 提出基于随机决策范围的约束强化学习方法,提升样本效率和回报-违例权衡。 reinforcement learning SAC
10 Topology-Aware Revival for Efficient Sparse Training 提出拓扑感知复苏(TAR)方法,提升静态稀疏训练在强化学习中的性能。 reinforcement learning deep reinforcement learning SAC
11 Contrastive Continual Learning for Model Adaptability in Internet of Things 提出对比持续学习以解决物联网模型适应性问题 representation learning contrastive learning distillation
12 CRoSS: A Continual Robotic Simulation Suite for Scalable Reinforcement Learning with High Task Diversity and Realistic Physics Simulation 提出CRoSS:一个可扩展的、高任务多样性和真实物理仿真的持续机器人学习平台。 reinforcement learning
13 The Key to State Reduction in Linear Attention: A Rank-based Perspective 提出基于秩的线性注意力状态压缩方法,提升效率并降低内存占用。 linear attention
14 Rationality Measurement and Theory for Reinforcement Learning Agents 提出理性测量与理论以优化强化学习代理的决策 reinforcement learning
15 DMFlow: Disordered Materials Generation by Flow Matching 提出DMFlow,通过流匹配生成无序材料,填补了深度生成模型在无序晶体生成方面的空白。 flow matching
16 Knowledge Distillation for mmWave Beam Prediction Using Sub-6 GHz Channels 提出基于知识蒸馏的毫米波波束预测方法,利用Sub-6 GHz信道信息,降低计算复杂度。 distillation
17 MirrorLA: Reflecting Feature Map for Vision Linear Attention MirrorLA通过反射特征图解决线性注意力性能下降问题,提升表征能力。 linear attention
18 Decoupling Time and Risk: Risk-Sensitive Reinforcement Learning with General Discounting 提出支持广义折扣的风险敏感强化学习框架,解耦时间偏好与风险评估 reinforcement learning
19 Evolving Afferent Architectures: Biologically-inspired Models for Damage-Avoidance Learning 提出基于进化仿生模型的传入学习框架,用于损伤规避学习。 reinforcement learning policy learning
20 From Ambiguity to Action: A POMDP Perspective on Partial Multi-Label Ambiguity and Its Horizon-One Resolution 提出基于POMDP的部分多标签学习框架,解决标签歧义并优化特征选择。 reinforcement learning transformer policy

🔬 支柱九:具身大模型 (Embodied Foundation Models) (19 篇)

#题目一句话要点标签🔗
21 Toward Effective Multimodal Graph Foundation Model: A Divide-and-Conquer Based Approach 提出PLANET,一种基于分治策略的多模态图神经网络,用于解决模态交互和对齐问题。 foundation model multimodal
22 Dynamical Regimes of Multimodal Diffusion Models 提出耦合扩散模型的理论框架,揭示多模态生成的动态机制与时间尺度 multimodal
23 Billion-Scale Graph Foundation Models 提出GraphBFF框架,用于构建十亿级参数的图神经网络基础模型,实现通用图数据的零样本学习。 foundation model
24 Active Asymmetric Multi-Agent Multimodal Learning under Uncertainty 提出A2MAML,解决多智能体异构多模态感知中的不确定性问题,提升事故检测率。 multimodal
25 Multi-scale hypergraph meets LLMs: Aligning large language models for time series analysis MSH-LLM:多尺度超图对齐大语言模型用于时间序列分析 large language model
26 BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models BPDQ:基于可变网格的比特平面分解量化,用于大语言模型压缩 large language model
27 Training Data Efficiency in Multimodal Process Reward Models 提出平衡信息评分(BIS)方法,提升多模态过程奖励模型训练的数据效率。 multimodal
28 Multi-Head LatentMoE and Head Parallel: Communication-Efficient and Deterministic MoE Parallelism 提出Multi-Head LatentMoE与Head Parallel,实现通信高效且确定的MoE并行训练。 large language model foundation model
29 EXaMCaP: Subset Selection with Entropy Gain Maximization for Probing Capability Gains of Large Chart Understanding Training Sets 提出EXaMCaP,通过熵增益最大化进行子集选择,高效评估图表理解训练集的能力增益。 large language model multimodal
30 Subliminal Effects in Your Data: A General Mechanism via Log-Linearity 提出Logit-Linear-Selection方法,揭示数据集中的隐蔽影响,实现模型行为操控。 large language model
31 Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data Generation 提出Team-then-Trim框架,利用LLM流水线生成高质量表格数据,解决数据稀缺问题。 large language model
32 Decomposing Query-Key Feature Interactions Using Contrastive Covariances 提出对比协方差方法,分解Transformer的Query-Key交互空间,提升模型可解释性。 large language model
33 Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates 提出条件反事实均值嵌入(CCME)框架,用于异质性处理效应的完整条件分布刻画。 multimodal
34 From Data to Behavior: Predicting Unintended Model Behaviors Before Training 提出Data2Behavior任务与MDF方法,用于训练前预测LLM的潜在偏差与风险。 large language model
35 LoRDO: Distributed Low-Rank Optimization with Infrequent Communication LoRDO:一种低通信频率的分布式低秩优化方法,用于解决大模型训练中的带宽瓶颈。 foundation model
36 On the use of LLMs to generate a dataset of Neural Networks 利用大型语言模型生成多样化神经网络数据集,促进可靠性和适应性研究 large language model
37 UnMaskFork: Test-Time Scaling for Masked Diffusion via Deterministic Action Branching 提出UnMaskFork,通过确定性动作分支实现Masked Diffusion模型测试时性能提升 large language model
38 Disentangling Causal Importance from Emergent Structure in Multi-Expert Orchestration 提出INFORM,解耦多专家协同中的因果重要性与涌现结构 large language model
39 RAPO: Risk-Aware Preference Optimization for Generalizable Safe Reasoning 提出RAPO框架,提升大型推理模型在复杂攻击下的安全推理泛化能力 chain-of-thought

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
40 From Sparse Sensors to Continuous Fields: STRIDE for Spatiotemporal Reconstruction STRIDE:基于稀疏传感器数据时空重建的隐式神经表示方法 sparse sensors spatiotemporal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
41 Pruning for Generalization: A Transfer-Oriented Spatiotemporal Graph Framework 提出TL-GPSTGN,通过剪枝优化图结构时空预测,提升小样本和跨域泛化能力 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
42 Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits 针对线性与平滑上下文Bandit,提出抗攻击的均匀公平算法,提升系统鲁棒性。 manipulation

⬅️ 返回 cs.LG 首页 · 🏠 返回主页