cs.AI(2025-09-28)
📊 共 26 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
🔬 支柱九:具身大模型 (Embodied Foundation Models) (17 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Conditional Advantage Estimation for Reinforcement Learning in Large Reasoning Models | 提出CANON:一种条件优势估计方法,提升大型推理模型在强化学习中的性能。 | reinforcement learning large language model | ||
| 19 | SAC-Opt: Semantic Anchors for Iterative Correction in Optimization Modeling | 提出SAC-Opt,通过语义锚点迭代修正优化建模中的逻辑错误。 | SAC large language model | ||
| 20 | Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning | 提出PASS框架,利用强化学习和形式化描述提升LLM提示越狱攻击的隐蔽性和有效性 | reinforcement learning large language model | ||
| 21 | Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack | 提出SCAR:一种蒸馏条件后门攻击方法,可注入隐蔽后门至教师模型。 | distillation | ✅ | |
| 22 | How LLMs Learn to Reason: A Complex Network Perspective | 提出Annealed-RLVR算法,通过调控概念网络拓扑结构提升LLM推理能力 | reinforcement learning large language model | ||
| 23 | Continual Learning to Generalize Forwarding Strategies for Diverse Mobile Wireless Networks | 提出一种基于持续学习的通用转发策略,提升移动无线网络在多样场景下的性能。 | reinforcement learning deep reinforcement learning DRL | ||
| 24 | Gradient Coupling: The Hidden Barrier to Generalization in Agentic Reinforcement Learning | 提出梯度耦合理论,并通过解耦动作嵌入提升强化学习泛化性 | reinforcement learning | ||
| 25 | EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance | EAPO:通过按需专家辅助增强策略优化,提升LLM推理能力 | reinforcement learning large language model | ||
| 26 | Reasoning Scaffolding: Distilling the Flow of Thought from LLMs | 提出推理支架(Reasoning Scaffolding)框架,提升小模型推理能力和逻辑一致性。 | distillation large language model |