cs.LG(2025-12-02)
📊 共 4 篇论文
🎯 兴趣领域导航
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | SPARK: Stepwise Process-Aware Rewards for Reference-Free Reinforcement Learning | SPARK:提出基于逐步过程感知的免参考强化学习框架,提升数学推理能力。 | reinforcement learning chain-of-thought | ||
| 2 | OptPO: Optimal Rollout Allocation for Test-time Policy Optimization | OptPO:面向测试时策略优化的最优Rollout分配方法 | PPO large language model |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 3 | When Refusals Fail: Unstable Safety Mechanisms in Long-Context LLM Agents | 长文本LLM Agent安全性研究:揭示上下文长度对拒绝响应和任务性能的负面影响 | large language model | ||
| 4 | Real Time Detection and Quantitative Analysis of Spurious Forgetting in Continual Learning | 提出浅层与深层对齐框架,实时检测并缓解持续学习中的虚假遗忘问题 | large language model |