cs.LG(2026-02-09)
📊 共 19 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (9 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (6)
支柱一:机器人控制 (Robot Control) (3 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection | AnomSeer通过强化多模态LLM的时序推理能力,解决异常检测问题。 | reinforcement learning large language model multimodal | ||
| 2 | Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards | 提出上下文Rollout Bandits方法,提升可验证奖励强化学习的数学推理能力。 | reinforcement learning large language model | ||
| 3 | Dreaming in Code for Curriculum Learning in Open-Ended Worlds | DiCode:利用代码生成环境进行课程学习,提升开放世界智能体能力 | curriculum learning foundation model | ✅ | |
| 4 | Bayesian Preference Learning for Test-Time Steerable Reward Models | 提出Variational In-Context Reward Modeling (ICRM),实现测试时可控的奖励模型。 | reinforcement learning preference learning | ||
| 5 | StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors | StealthRL:一种基于强化学习的AI文本检测器对抗性复述攻击方法 | reinforcement learning | ✅ | |
| 6 | Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems | Dr. MAS:针对多智能体LLM系统的稳定强化学习训练框架 | reinforcement learning | ||
| 7 | LLaDA2.1: Speeding Up Text Diffusion via Token Editing | LLaDA2.1:通过Token编辑加速文本扩散模型推理,兼顾速度与质量。 | reinforcement learning instruction following | ||
| 8 | When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems | 提出多智能体强化学习以提升大语言模型的学习效率 | reinforcement learning large language model | ||
| 9 | Learning in Context, Guided by Choice: A Reward-Free Paradigm for Reinforcement Learning with Transformers | 提出基于偏好的强化学习方法以解决奖励信号不足的问题 | reinforcement learning |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | Linearization Explains Fine-Tuning in Large Language Models | 通过线性化解释大语言模型微调机制,揭示NTK谱与性能关联 | large language model | ||
| 11 | ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling | 提出ANCRe自适应神经连接重分配,提升深度模型深度扩展效率。 | large language model foundation model | ||
| 12 | Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense | 提出下一代CAPTCHA框架,利用认知差距防御高级GUI代理攻击 | multimodal | ||
| 13 | CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation | 提出CompilerKV,通过离线经验编译实现风险自适应的KV压缩,提升长文本LLM性能。 | large language model | ||
| 14 | LEFT: Learnable Fusion of Tri-view Tokens for Unsupervised Time Series Anomaly Detection | 提出LEFT框架,通过可学习的三视图Token融合进行无监督时间序列异常检测。 | TAMP | ||
| 15 | Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference | 提出Pre-hoc Sparsity,解决长文本推理中KV缓存选择的后验偏差问题。 | large language model |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Evasion of IoT Malware Detection via Dummy Code Injection | 通过注入伪代码规避物联网恶意软件的功耗侧信道检测 | manipulation | ||
| 17 | Reinforcement Learning with Backtracking Feedback | 提出RLBF框架,通过强化学习动态纠正LLM生成错误,提升模型安全性。 | manipulation reinforcement learning large language model | ||
| 18 | Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs | 揭示LLM的隐式记忆:利用输出作为隐藏信道实现跨交互状态保持 | manipulation large language model | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Spherical Steering: Geometry-Aware Activation Rotation for Language Models | 提出Spherical Steering,通过几何感知激活旋转提升语言模型推理时控制能力。 | geometric consistency |