cs.AI(2025-04-07)
📊 共 35 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (22 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (12 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (22 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | R2Vul: Learning to Reason about Software Vulnerabilities with Reinforcement Learning and Structured Reasoning Distillation | R2Vul:结合强化学习与结构化推理蒸馏提升代码LLM的软件漏洞检测能力 | reinforcement learning distillation large language model | ||
| 24 | Deep Reinforcement Learning Algorithms for Option Hedging | 对比深度强化学习算法在期权对冲中的表现,MCPG算法表现最佳 | reinforcement learning deep reinforcement learning DRL | ||
| 25 | Resource-Efficient Beam Prediction in mmWave Communications with Multimodal Realistic Simulation Framework | 提出基于跨模态关系知识蒸馏的毫米波通信波束预测方法,提升资源效率。 | distillation multimodal | ||
| 26 | Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning | 提出基于强化学习微调的LLM进化搜索算法,加速组合优化算法发现 | reinforcement learning large language model | ||
| 27 | VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks | VAPO:用于高级推理任务的高效可靠的强化学习框架 | reinforcement learning chain-of-thought | ||
| 28 | GAMDTP: Dynamic Trajectory Prediction with Graph Attention Mamba Network | 提出GAMDTP以解决动态轨迹预测问题 | Mamba SSM | ||
| 29 | Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | 提出Step-Wise RL,通过合成数据和多步强化学习提升语言模型在推理和工具使用上的性能。 | reinforcement learning RLHF large language model | ||
| 30 | HypRL: Reinforcement Learning of Control Policies for Hyperproperties | HYPRL:提出一种基于HyperLTL规范引导的多智能体强化学习控制策略框架 | reinforcement learning reward shaping | ||
| 31 | Interactive Explanations for Reinforcement-Learning Agents | 提出ASQ-IT交互式解释系统,提升用户对强化学习智能体行为的理解和问题定位能力 | reinforcement learning | ||
| 32 | Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling | 提出LLM-QL模型,利用查询似然建模增强LLM在稠密检索中的性能 | contrastive learning large language model | ||
| 33 | Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors | 提出W4S框架,利用弱Meta-Agent优化工作流,提升强执行器的性能。 | reinforcement learning large language model | ||
| 34 | GOTHAM: Graph Class Incremental Learning Framework under Weak Supervision | 提出GOTHAM框架,解决弱监督下图数据的类别增量学习问题。 | teacher-student distillation | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 35 | How to evaluate control measures for LLM agents? A trajectory from today to superintelligence | 提出LLM Agent控制评估框架,根据Agent能力演进调整红队对抗策略 | affordance |