cs.LG(2026-01-30)

📊 共 45 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (20 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (20 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (20 篇)

#题目一句话要点标签🔗
1 Local-Global Multimodal Contrastive Learning for Molecular Property Prediction 提出LGM-CL框架,通过局部-全局多模态对比学习提升分子性质预测精度。 representation learning contrastive learning multimodal
2 Clipping-Free Policy Optimization for Large Language Models 提出无剪切策略优化以解决大语言模型训练不稳定问题 reinforcement learning large language model instruction following
3 Continual Policy Distillation from Distributed Reinforcement Learning Teachers 提出基于分布式强化学习教师模型的持续策略蒸馏框架,解决终身学习智能体的灾难性遗忘问题。 reinforcement learning teacher-student distillation
4 From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning 提出RLRR框架,通过相对奖励解决基于群体强化学习中的奖励稀疏和不稳定的问题 reinforcement learning reward shaping large language model
5 Agile Reinforcement Learning through Separable Neural Architecture 提出SPAN:一种基于可分离神经架构的敏捷强化学习方法,提升样本效率和策略学习。 reinforcement learning deep reinforcement learning policy learning
6 Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning 提出基于连续约束插值的自动约束策略优化算法,提升离线强化学习性能 reinforcement learning offline reinforcement learning behavior cloning
7 Elastic Spectral State Space Models for Budgeted Inference 提出弹性谱状态空间模型,实现单次训练、任意规模的运行时推理。 SSM state space model distillation
8 Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology 揭示大语言模型中无奖励探索的潜在学习能力,借鉴心理学理论。 reinforcement learning large language model
9 Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment 提出SCIQL,通过鲁棒的风格对齐实现高质量离线强化学习 reinforcement learning offline reinforcement learning
10 DRL-Enabled Trajectory Planing for UAV-Assisted VLC: Optimal Altitude and Reward Design 提出基于DRL的无人机辅助VLC轨迹规划方法,优化飞行高度和奖励函数设计 DRL reward design
11 RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning 提出基于离散化分类Actor和正则化网络的On-Policy强化学习方法,提升连续控制任务性能。 reinforcement learning deep reinforcement learning
12 Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation 通过流图分析和自蒸馏稳定一致性训练,提升生成模型性能 policy learning distillation
13 On Safer Reinforcement Learning Policies for Sedation and Analgesia in Intensive Care 提出兼顾镇静镇痛与患者生存的强化学习策略,提升ICU用药安全 reinforcement learning deep reinforcement learning
14 CATTO: Balancing Preferences and Confidence in Language Models 提出CATTO以解决语言模型置信度校准问题 DPO direct preference optimization large language model
15 SplineFlow: Flow Matching for Dynamical Systems with B-Spline Interpolants SplineFlow:提出基于B样条插值的Flow Matching方法,用于动态系统建模。 flow matching
16 OptiMAG: Structure-Semantic Alignment via Unbalanced Optimal Transport 提出OptiMAG以解决多模态图结构与语义不一致问题 representation learning multimodal
17 Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features 提出级联流匹配模型,用于生成包含混合类型特征的异构表格数据 flow matching
18 MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning MC-GRPO:面向小规模Rollout强化学习的中心化群组相对策略优化 reinforcement learning
19 Gradual Fine-Tuning for Flow Matching Models 提出渐进式微调(GFT)框架,提升Flow Matching模型在分布偏移下的适应性和推理效率。 flow matching
20 HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning 提出HeaPA,通过堆采样和在线查询增强提升LLM强化学习效率。 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (20 篇)

#题目一句话要点标签🔗
21 TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training TEON:面向大语言模型预训练的张量化正交归一化优化方法 large language model
22 Probing the Trajectories of Reasoning Traces in Large Language Models 提出轨迹探测方法,分析大型语言模型推理过程中的决策演变与信息贡献。 large language model
23 SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training SPICE:通过子模惩罚信息冲突选择高效的大语言模型训练数据 large language model
24 ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations 提出ExplainerPFN,用于无模型零样本特征重要性估计的表格数据基础模型 foundation model
25 Vision-Language Models Unlock Task-Centric Latent Actions 利用视觉-语言模型解锁任务中心潜在动作,提升复杂环境下的动作表征能力 vision-language-action
26 FOCUS: DLLMs Know How to Tame Their Compute Bound FOCUS:通过动态计算分配,显著提升扩散语言模型(DLLM)的推理吞吐量。 large language model
27 Nested Slice Sampling: Vectorized Nested Sampling for GPU-Accelerated Inference 提出Nested Slice Sampling,加速GPU上的嵌套抽样推理,解决复杂多模态目标问题。 multimodal
28 TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification TriSpec:通过轻量级代理验证实现三元推测解码,提升LLM推理效率。 large language model
29 Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data Behemoth:利用全合成数据基准测试LLM中的模型遗忘能力 large language model
30 Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning 提出DC-CoT,通过并行推理减少LLM中CoT的长推理延迟 chain-of-thought
31 Learnable Permutation for Structured Sparsity on Transformer Models 提出可学习的置换框架,用于Transformer模型结构化稀疏化 large language model
32 Hierarchical Shift Mixing -- Beyond Dense Attention in Transformers 提出分层移位混合(HSM),在Transformer中实现线性复杂度Token混合,提升效率。 large language model
33 AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation AscendCraft:通过DSL引导的转译自动生成昇腾NPU内核 large language model
34 Do Transformers Have the Ability for Periodicity Generalization? 研究Transformer在周期性泛化能力上的局限性,并提出可控生成基准Coper。 large language model
35 Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification 提出形式逻辑验证引导的框架,提升LLM在自然推理任务中的性能 large language model
36 TTCS: Test-Time Curriculum Synthesis for Self-Evolving TTCS:面向自进化的测试时课程合成,提升大语言模型推理能力 large language model
37 HetCCL: Accelerating LLM Training with Heterogeneous GPUs HetCCL:利用异构GPU加速LLM训练的集合通信库 large language model
38 Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic 揭示Transformer在算术任务中“破碎的组合性”学习现象,挑战传统认知 large language model
39 Transform-Augmented GRPO Improves Pass@k 提出TA-GRPO,通过转换增强提升GRPO在数学推理中的Pass@k指标 large language model
40 Toward Non-Expert Customized Congestion Control 提出NECC框架,利用大语言模型为非专家用户定制拥塞控制算法 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
41 Names Don't Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning 提出符号不变Transformer,解决开放词汇学习中泛化性差的问题 open-vocabulary open vocabulary
42 EUGens: Efficient, Unified, and General Dense Layers 提出EUGens高效通用稠密层,加速神经网络推理并降低参数量。 scene reconstruction

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
43 Solving Inverse Problems with Flow-based Models via Model Predictive Control 提出MPC-Flow,通过模型预测控制解决Flow模型逆问题,实现高效条件生成。 MPC model predictive control
44 Securing Time in Energy IoT: A Clock-Dynamics-Aware Spatio-Temporal Graph Attention Network for Clock Drift Attacks and Y2K38 Failures 提出STGAT时空图注意力网络,解决能源物联网中的时钟漂移攻击和Y2K38问题。 manipulation TAMP

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
45 Mano: Restriking Manifold Optimization for LLM Training 提出Mano优化器,通过重构流形优化方法提升LLM训练效率。 MANO large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页