cs.LG(2025-12-28)

📊 共 21 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (12 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱一:机器人控制 (Robot Control) (2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
1 Multimodal Functional Maximum Correlation for Emotion Recognition 提出多模态功能最大相关(MFMC)框架,用于提升情感识别的性能。 representation learning multimodal
2 A Note on Hybrid Online Reinforcement and Imitation Learning for LLMs: Formulations and Algorithms 提出LLM混合在线强化学习与模仿学习统一框架,提升微调效率 reinforcement learning imitation learning large language model
3 Trust Region Masking for Long-Horizon LLM Reinforcement Learning 提出Trust Region Masking,解决长序列LLM强化学习中的信任域失效问题。 reinforcement learning PPO large language model
4 Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning 提出基于动态词汇剪枝的稳定LLM强化学习方法,解决训练-推理不匹配问题。 reinforcement learning large language model
5 ReDiF: Reinforced Distillation for Few Step Diffusion 提出基于强化学习的扩散模型蒸馏框架ReDiF,实现更少步骤的高效生成。 reinforcement learning distillation
6 Breaking the Memory Wall: Exact Analytical Differentiation via Tiled Operator-Space Evolution 提出PGF框架,通过分块算子空间演化实现选择性状态空间模型中精确解析微分的内存优化。 SSM state space model PULSE
7 Value-guided action planning with JEPA world models 提出基于JEPA世界模型的价值引导动作规划方法,提升控制任务性能 world model
8 APO: Alpha-Divergence Preference Optimization 提出Alpha-Divergence偏好优化(APO),在锚定框架下实现前向和反向KL散度的平滑插值,提升对齐训练的稳定性和性能。 reinforcement learning PPO distillation
9 Sat-EnQ: Satisficing Ensembles of Weak Q-Learners for Reliable and Compute-Efficient Reinforcement Learning Sat-EnQ:通过满足性弱Q学习器集成实现可靠且计算高效的强化学习 reinforcement learning
10 Causal-Policy Forest for End-to-End Policy Learning 提出Causal-Policy Forest算法,用于端到端因果策略学习。 policy learning
11 Long-Range Distillation: Distilling 10,000 Years of Simulated Climate into Long Timestep AI Weather Models 提出长程蒸馏方法,利用AI生成气候数据提升长时程AI天气模型预测能力 distillation
12 FoldAct: Efficient and Stable Context Folding for Long-Horizon Search Agents 提出FoldAct以解决长时间搜索代理的上下文折叠问题 reinforcement learning large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
13 Fusion or Confusion? Multimodal Complexity Is Not All You Need 提出SimBaMM多模态学习基线,挑战多模态复杂架构的必要性 multimodal
14 Merge before Forget: A Single LoRA Continual Learning via Continual Merging 提出一种基于持续合并的单LoRA持续学习方法,有效解决灾难性遗忘问题。 large language model
15 Debugging Tabular Log as Dynamic Graphs 提出GraphLogDebugger框架,利用动态图调试表格日志,提升效率与可扩展性 large language model
16 Theory and Algorithms for Learning with Multi-Class Abstention and Multi-Expert Deferral 针对大模型幻觉与高推理成本,提出多类别拒识与多专家推迟学习的理论与算法。 large language model
17 Understanding the Mechanisms of Fast Hyperparameter Transfer 提出超参数快速迁移框架,揭示μP下模型宽度缩放的优化机制 large language model
18 Bridging Global Intent with Local Details: A Hierarchical Representation Approach for Semantic Validation in Text-to-SQL 提出HEROSQL,通过分层表示和嵌套消息传递网络,提升Text-to-SQL语义验证的准确性。 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
19 Adaptive Trust Consensus for Blockchain IoT: Comparing RL, DRL, and MARL Against Naive, Collusive, Adaptive, Byzantine, and Sleeper Attacks 提出基于强化学习的自适应信任共识机制,防御区块链物联网中的复杂攻击 manipulation reinforcement learning DRL
20 TEACH: Temporal Variance-Driven Curriculum for Reinforcement Learning 提出基于时序方差驱动课程学习的TEACH框架,加速多目标强化学习。 manipulation reinforcement learning curriculum learning

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
21 PI-MFM: Physics-informed multimodal foundation model for solving partial differential equations 提出PI-MFM以高效解决偏微分方程问题 spatiotemporal foundation model multimodal

⬅️ 返回 cs.LG 首页 · 🏠 返回主页