cs.LG（2026-01-30）

📊 共 45 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (20 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (20 🔗4) 支柱三：空间感知与语义 (Perception & Semantics) (2) 支柱一：机器人控制 (Robot Control) (2) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (20 篇)

#	题目	一句话要点	标签	🔗
1	Local-Global Multimodal Contrastive Learning for Molecular Property Prediction	提出LGM-CL框架，通过局部-全局多模态对比学习提升分子性质预测精度。	representation learning contrastive learning multimodal
2	Clipping-Free Policy Optimization for Large Language Models	提出无剪切策略优化以解决大语言模型训练不稳定问题	reinforcement learning large language model instruction following
3	Continual Policy Distillation from Distributed Reinforcement Learning Teachers	提出基于分布式强化学习教师模型的持续策略蒸馏框架，解决终身学习智能体的灾难性遗忘问题。	reinforcement learning teacher-student distillation
4	From Absolute to Relative: Rethinking Reward Shaping in Group-Based Reinforcement Learning	提出RLRR框架，通过相对奖励解决基于群体强化学习中的奖励稀疏和不稳定的问题	reinforcement learning reward shaping large language model
5	Agile Reinforcement Learning through Separable Neural Architecture	提出SPAN：一种基于可分离神经架构的敏捷强化学习方法，提升样本效率和策略学习。	reinforcement learning deep reinforcement learning policy learning
6	Automatic Constraint Policy Optimization based on Continuous Constraint Interpolation Framework for Offline Reinforcement Learning	提出基于连续约束插值的自动约束策略优化算法，提升离线强化学习性能	reinforcement learning offline reinforcement learning behavior cloning
7	Elastic Spectral State Space Models for Budgeted Inference	提出弹性谱状态空间模型，实现单次训练、任意规模的运行时推理。	SSM state space model distillation
8	Unrewarded Exploration in Large Language Models Reveals Latent Learning from Psychology	揭示大语言模型中无奖励探索的潜在学习能力，借鉴心理学理论。	reinforcement learning large language model
9	Offline Reinforcement Learning of High-Quality Behaviors Under Robust Style Alignment	提出SCIQL，通过鲁棒的风格对齐实现高质量离线强化学习	reinforcement learning offline reinforcement learning	✅
10	DRL-Enabled Trajectory Planing for UAV-Assisted VLC: Optimal Altitude and Reward Design	提出基于DRL的无人机辅助VLC轨迹规划方法，优化飞行高度和奖励函数设计	DRL reward design
11	RN-D: Discretized Categorical Actors with Regularized Networks for On-Policy Reinforcement Learning	提出基于离散化分类Actor和正则化网络的On-Policy强化学习方法，提升连续控制任务性能。	reinforcement learning deep reinforcement learning
12	Stabilizing Consistency Training: A Flow Map Analysis and Self-Distillation	通过流图分析和自蒸馏稳定一致性训练，提升生成模型性能	policy learning distillation
13	On Safer Reinforcement Learning Policies for Sedation and Analgesia in Intensive Care	提出兼顾镇静镇痛与患者生存的强化学习策略，提升ICU用药安全	reinforcement learning deep reinforcement learning
14	CATTO: Balancing Preferences and Confidence in Language Models	提出CATTO以解决语言模型置信度校准问题	DPO direct preference optimization large language model
15	SplineFlow: Flow Matching for Dynamical Systems with B-Spline Interpolants	SplineFlow：提出基于B样条插值的Flow Matching方法，用于动态系统建模。	flow matching	✅
16	OptiMAG: Structure-Semantic Alignment via Unbalanced Optimal Transport	提出OptiMAG以解决多模态图结构与语义不一致问题	representation learning multimodal
17	Cascaded Flow Matching for Heterogeneous Tabular Data with Mixed-Type Features	提出级联流匹配模型，用于生成包含混合类型特征的异构表格数据	flow matching
18	MC-GRPO: Median-Centered Group Relative Policy Optimization for Small-Rollout Reinforcement Learning	MC-GRPO：面向小规模Rollout强化学习的中心化群组相对策略优化	reinforcement learning	✅
19	Gradual Fine-Tuning for Flow Matching Models	提出渐进式微调(GFT)框架，提升Flow Matching模型在分布偏移下的适应性和推理效率。	flow matching
20	HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning	提出HeaPA，通过堆采样和在线查询增强提升LLM强化学习效率。	reinforcement learning	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (20 篇)

#	题目	一句话要点	标签	🔗
21	TEON: Tensorized Orthonormalization Beyond Layer-Wise Muon for Large Language Model Pre-Training	TEON：面向大语言模型预训练的张量化正交归一化优化方法	large language model
22	Probing the Trajectories of Reasoning Traces in Large Language Models	提出轨迹探测方法，分析大型语言模型推理过程中的决策演变与信息贡献。	large language model
23	SPICE: Submodular Penalized Information-Conflict Selection for Efficient Large Language Model Training	SPICE：通过子模惩罚信息冲突选择高效的大语言模型训练数据	large language model
24	ExplainerPFN: Towards tabular foundation models for model-free zero-shot feature importance estimations	提出ExplainerPFN，用于无模型零样本特征重要性估计的表格数据基础模型	foundation model
25	Vision-Language Models Unlock Task-Centric Latent Actions	利用视觉-语言模型解锁任务中心潜在动作，提升复杂环境下的动作表征能力	vision-language-action
26	FOCUS: DLLMs Know How to Tame Their Compute Bound	FOCUS：通过动态计算分配，显著提升扩散语言模型（DLLM）的推理吞吐量。	large language model	✅
27	Nested Slice Sampling: Vectorized Nested Sampling for GPU-Accelerated Inference	提出Nested Slice Sampling，加速GPU上的嵌套抽样推理，解决复杂多模态目标问题。	multimodal
28	TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification	TriSpec：通过轻量级代理验证实现三元推测解码，提升LLM推理效率。	large language model
29	Behemoth: Benchmarking Unlearning in LLMs Using Fully Synthetic Data	Behemoth：利用全合成数据基准测试LLM中的模型遗忘能力	large language model	✅
30	Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning	提出DC-CoT，通过并行推理减少LLM中CoT的长推理延迟	chain-of-thought	✅
31	Learnable Permutation for Structured Sparsity on Transformer Models	提出可学习的置换框架，用于Transformer模型结构化稀疏化	large language model
32	Hierarchical Shift Mixing -- Beyond Dense Attention in Transformers	提出分层移位混合（HSM），在Transformer中实现线性复杂度Token混合，提升效率。	large language model
33	AscendCraft: Automatic Ascend NPU Kernel Generation via DSL-Guided Transcompilation	AscendCraft：通过DSL引导的转译自动生成昇腾NPU内核	large language model
34	Do Transformers Have the Ability for Periodicity Generalization?	研究Transformer在周期性泛化能力上的局限性，并提出可控生成基准Coper。	large language model
35	Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification	提出形式逻辑验证引导的框架，提升LLM在自然推理任务中的性能	large language model
36	TTCS: Test-Time Curriculum Synthesis for Self-Evolving	TTCS：面向自进化的测试时课程合成，提升大语言模型推理能力	large language model	✅
37	HetCCL: Accelerating LLM Training with Heterogeneous GPUs	HetCCL：利用异构GPU加速LLM训练的集合通信库	large language model
38	Shattered Compositionality: Counterintuitive Learning Dynamics of Transformers for Arithmetic	揭示Transformer在算术任务中“破碎的组合性”学习现象，挑战传统认知	large language model
39	Transform-Augmented GRPO Improves Pass@k	提出TA-GRPO，通过转换增强提升GRPO在数学推理中的Pass@k指标	large language model
40	Toward Non-Expert Customized Congestion Control	提出NECC框架，利用大语言模型为非专家用户定制拥塞控制算法	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
41	Names Don't Matter: Symbol-Invariant Transformer for Open-Vocabulary Learning	提出符号不变Transformer，解决开放词汇学习中泛化性差的问题	open-vocabulary open vocabulary
42	EUGens: Efficient, Unified, and General Dense Layers	提出EUGens高效通用稠密层，加速神经网络推理并降低参数量。	scene reconstruction

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
43	Solving Inverse Problems with Flow-based Models via Model Predictive Control	提出MPC-Flow，通过模型预测控制解决Flow模型逆问题，实现高效条件生成。	MPC model predictive control
44	Securing Time in Energy IoT: A Clock-Dynamics-Aware Spatio-Temporal Graph Attention Network for Clock Drift Attacks and Y2K38 Failures	提出STGAT时空图注意力网络，解决能源物联网中的时钟漂移攻击和Y2K38问题。	manipulation TAMP

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
45	Mano: Restriking Manifold Optimization for LLM Training	提出Mano优化器，通过重构流形优化方法提升LLM训练效率。	MANO large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页

cs.LG（2026-01-30）

🎯 兴趣领域导航

🔬 支柱二：RL算法与架构 (RL & Architecture) (20 篇)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (20 篇)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理