cs.AI(2026-02-26)
📊 共 34 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (24 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (9 🔗1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (24 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | FactGuard: Agentic Video Misinformation Detection via Reinforcement Learning | 提出FactGuard以解决视频虚假信息检测中的推理不足问题 | reinforcement learning large language model multimodal | ||
| 26 | RLHFless: Serverless Computing for Efficient RLHF | 提出RLHFless,利用Serverless计算高效训练RLHF,提升资源利用率并降低成本。 | reinforcement learning RLHF large language model | ||
| 27 | The Trinity of Consistency as a Defining Principle for General World Models | 提出通用世界模型的“一致性三位一体”原则,并构建多帧推理与生成基准CoW-Bench。 | world model multimodal | ||
| 28 | Agentic AI for Intent-driven Optimization in Cell-free O-RAN | 提出Agentic AI框架,用于Cell-free O-RAN中意图驱动的优化,提升资源利用率。 | reinforcement learning deep reinforcement learning DRL | ||
| 29 | Agency and Architectural Limits: Why Optimization-Based Systems Cannot Be Norm-Responsive | 揭示基于优化的AI系统在规范响应上的局限性,提出架构性约束条件。 | reinforcement learning RLHF large language model | ||
| 30 | QSIM: Mitigating Overestimation in Multi-Agent Reinforcement Learning via Action Similarity Weighted Q-Learning | QSIM:通过动作相似性加权Q学习缓解多智能体强化学习中的过度估计 | reinforcement learning | ✅ | |
| 31 | Automated Vulnerability Detection in Source Code Using Deep Representation Learning | 提出基于卷积神经网络的源代码漏洞自动检测方法,提升C语言漏洞检测召回率。 | representation learning | ||
| 32 | Towards LLM-Empowered Knowledge Tracing via LLM-Student Hierarchical Behavior Alignment in Hyperbolic Space | 提出L-HAKT,利用LLM和双曲空间对齐学生行为,提升知识追踪效果 | contrastive learning large language model | ||
| 33 | Learning to Generate Secure Code via Token-Level Rewards | 提出Vul2Safe框架,通过token级奖励学习生成安全代码,解决安全数据稀缺和奖励信号粗糙问题。 | reinforcement learning large language model |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 34 | The AI Research Assistant: Promise, Peril, and a Proof of Concept | 利用人机协作发现Hermite求积法则的新误差表示和界限 | manipulation |