cs.LG(2026-02-03)

📊 共 54 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (29 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (22 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (29 篇)

#题目一句话要点标签🔗
1 medR: Reward Engineering for Clinical Offline Reinforcement Learning via Tri-Drive Potential Functions medR:通过三驱动势函数进行临床离线强化学习的奖励工程 reinforcement learning policy learning offline reinforcement learning
2 Reinforcement Learning with Promising Tokens for Large Language Models 提出RLPT框架,通过有希望的tokens进行强化学习,提升LLM推理能力。 reinforcement learning large language model
3 Entropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language Models 提出熵门控选择性策略优化EGSPO,用于大语言模型混合训练中的token级梯度分配。 reinforcement learning PPO large language model
4 Robust Representation Learning in Masked Autoencoders 研究表明掩码自编码器(MAE)学习的表征具有很强的鲁棒性,尤其是在图像分类任务中。 representation learning masked autoencoder MAE
5 Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation 提出Cobalt方法,结合在线与离线强化学习,提升多轮代码生成性能。 reinforcement learning offline RL large language model
6 CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering 提出CoCoEmo,通过激活调控实现可组合、可控的类人情感TTS flow matching motion synthesis
7 Self-Hinting Language Models Enhance Reinforcement Learning 提出自提示对齐GRPO以解决稀疏奖励问题 reinforcement learning privileged information large language model
8 Antidistillation Fingerprinting 提出反蒸馏指纹(ADFP)方法,提升模型溯源能力并降低对模型效用的影响。 distillation large language model
9 CoGenCast: A Coupled Autoregressive-Flow Generative Framework for Time Series Forecasting CoGenCast:耦合自回归-Flow生成模型用于时间序列预测 flow matching large language model multimodal
10 Not All Negative Samples Are Equal: LLMs Learn Better from Plausible Reasoning 提出PNS方法,通过高质量负样本提升LLM的推理能力 reinforcement learning large language model chain-of-thought
11 An Approximate Ascent Approach To Prove Convergence of PPO 提出近似上升方法,证明PPO收敛性,并解决优势函数估计问题 reinforcement learning deep reinforcement learning PPO
12 TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT 提出轨迹混合监督(TMS),解决SFT中策略漂移导致的灾难性遗忘问题。 reinforcement learning large language model instruction following
13 Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL PULSE:利用权重更新稀疏性,实现通信高效的分布式强化学习 reinforcement learning PULSE large language model
14 StepScorer: Accelerating Reinforcement Learning with Step-wise Scoring and Psychological Regret Modeling 提出基于心理后悔模型的强化学习加速方法,解决稀疏奖励下的收敛难题。 reinforcement learning PPO
15 Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning 提出神经预测-校正器(NPC),用强化学习解决同伦问题。 reinforcement learning
16 SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision Backbones 提出SAFE-KD,通过风险控制的提前退出蒸馏提升视觉骨干网络效率。 distillation
17 Preference-based Conditional Treatment Effects and Policy Learning 提出基于偏好的条件处理效应框架,用于异质性效应建模和策略学习 policy learning
18 Efficient Estimation of Kernel Surrogate Models for Task Attribution 提出基于核函数的代理模型,高效评估训练任务对目标任务的影响。 reinforcement learning large language model
19 Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL 提出推理缓存(RC)算法,通过短程强化学习实现LLM在长程推理上的持续改进。 reinforcement learning large language model
20 Conditional Flow Matching for Visually-Guided Acoustic Highlighting 提出基于条件流匹配的视觉引导声学增强方法,解决音频重混中的歧义性问题。 flow matching
21 ContraLog: Log File Anomaly Detection with Contrastive Learning and Masked Language Modeling ContraLog:基于对比学习和掩码语言模型的日志文件异常检测方法 contrastive learning
22 Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG 提出基于强化学习的RAG历史感知稠密检索微调方法,提升多跳推理性能 reinforcement learning large language model
23 Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing 提出基于罕见事件增强和双向配对的Prompt高效RLVR方法,提升大语言模型在确定性推理任务上的性能。 reinforcement learning large language model
24 Information-Theoretic Multi-Model Fusion for Target-Oriented Adaptive Sampling in Materials Design 提出信息论多模型融合的自适应采样方法,用于材料设计中的目标导向发现。 distillation multimodal
25 From Scalar Rewards to Potential Trends: Shaping Potential Landscapes for Model-Based Reinforcement Learning SLOPE:通过塑造潜在趋势,解决基于模型的强化学习在稀疏奖励环境下的难题 reinforcement learning
26 Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning 提出Prompt Augmentation,稳定扩展GRPO在数学推理上的训练,显著提升模型性能。 reinforcement learning large language model
27 Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost 提出量化进化策略QES,实现低精度代价下量化LLM的高精度微调 reinforcement learning large language model
28 CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs CoBA-RL:面向大语言模型能力自适应的强化学习预算分配 reinforcement learning
29 Asymmetric Hierarchical Anchoring for Audio-Visual Joint Representation: Resolving Information Allocation Ambiguity for Robust Cross-Modal Generalization 提出非对称分层锚定(AHA)方法,解决跨模态泛化中的信息分配歧义问题。 representation learning distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (22 篇)

#题目一句话要点标签🔗
30 R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model? 提出CADS框架,用于合成高质量多模态数据,提升MLLM在复杂现实任务中的性能。 large language model multimodal
31 On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models 提出基于熵动态分析的强化微调框架,优化LLM探索-利用平衡 large language model
32 Unveiling Covert Toxicity in Multimodal Data via Toxicity Association Graphs: A Graph-Based Metric and Interpretable Detection Framework 提出基于毒性关联图的多模态隐性毒性检测框架 multimodal
33 Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models 揭示GCG攻击盲点:对抗性Token位置显著影响大语言模型越狱攻击成功率 large language model
34 Generalizable and Interpretable RF Fingerprinting with Shapelet-Enhanced Large Language Models 提出Shapelet增强的大语言模型框架,实现通用且可解释的射频指纹识别 large language model
35 FedKRSO: Communication and Memory Efficient Federated Fine-Tuning of Large Language Models FedKRSO:一种通信和内存高效的联邦大语言模型全参数微调方法 large language model
36 From Zero to Hero: Advancing Zero-Shot Foundation Models for Tabular Outlier Detection OUTFORMER:通过混合先验和自适应课程学习提升表格异常检测零样本基础模型 foundation model
37 PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning 提出PLATE以解决无旧任务数据的持续学习问题 foundation model
38 UniGeM: Unifying Data Mixing and Selection via Geometric Exploration and Mining UniGeM:通过几何探索与挖掘统一数据混合与选择,提升LLM数据效率。 large language model
39 Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging 提出无时间限制的学习率调度与权重平均以优化语言模型预训练 large language model
40 Conflict-Resolving and Sharpness-Aware Minimization for Generalized Knowledge Editing with Multiple Updates 提出CoRSA框架,通过冲突解决和锐度感知最小化实现广义知识编辑 large language model
41 LLM-Inspired Pretrain-Then-Finetune for Small-Data, Large-Scale Optimization 提出基于LLM的预训练-微调框架,解决小数据大规模优化决策问题 large language model
42 Universal One-third Time Scaling in Learning Peaked Distributions 揭示Softmax交叉熵导致LLM训练损失幂律收敛,提出1/3普适时间缩放规律 large language model
43 Lookahead Path Likelihood Optimization for Diffusion LLMs 提出POKE-SMC,通过Path LL优化扩散LLM的逆掩码路径选择,提升推理精度。 large language model
44 MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling MeKi:利用存储空间扩展LLM,解决边缘设备上LLM部署难题 large language model
45 Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations 提出基于激活引导旋转的流式LLM更新方法,超越SFT收敛性能 large language model
46 Topology Matters: A Cautionary Case Study of Graph SSL on Neuro-Inspired Benchmarks 揭示图自监督学习在神经启发基准测试中对拓扑结构不敏感的问题 multimodal
47 DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference DynSplit-KV:针对长文本LLM推理,提出动态语义分割的KVCache压缩方法 large language model
48 Contrastive Concept-Tree Search for LLM-Assisted Algorithm Discovery 提出对比概念树搜索(CCTS),提升LLM辅助的算法发现效率与可解释性。 large language model
49 Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation ProCAD:通过主动澄清提升文本到CAD生成的鲁棒性 large language model
50 Rethinking Music Captioning with Music Metadata LLMs 提出基于音乐元数据的音乐描述方法,提升生成质量与灵活性 large language model
51 NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference 提出NLI非均匀线性插值,高效近似LLM中的非线性运算,降低推理成本。 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
52 Reparameterization Flow Policy Optimization 提出RFO算法,结合流策略与重参数化梯度,提升模型强化学习的样本效率。 quadruped locomotion manipulation
53 Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL 提出链式目标分层策略(CoGHP)以解决离线长程目标条件强化学习问题 manipulation reinforcement learning chain-of-thought

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
54 SATORIS-N: Spectral Analysis based Traffic Observation Recovery via Informed Subspaces and Nuclear-norm minimization 提出SATORIS-N以解决交通密度矩阵缺失数据重建问题 penetration spatiotemporal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页