cs.LG(2025-12-02)

📊 共 4 篇论文

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (2) 支柱九:具身大模型 (Embodied Foundation Models) (2)

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
1 SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning SPARK:提出基于逐步过程感知的免参考强化学习框架,提升数学推理能力。 reinforcement learning chain-of-thought
2 OptPO: Optimal Rollout Allocation for Test-time Policy Optimization OptPO:面向测试时策略优化的最优Rollout分配方法 PPO large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
3 When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents 长文本LLM Agent安全性研究:揭示上下文长度对拒绝响应和任务性能的负面影响 large language model
4 Real Time Detection and Quantitative Analysis of Spurious Forgetting in Continual Learning 提出浅层与深层对齐框架,实时检测并缓解持续学习中的虚假遗忘问题 large language model

⬅️ 返回 cs.LG 首页 · 🏠 返回主页