cs.LG(2026-02-09)

📊 共 19 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (9 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
1 AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection AnomSeer通过强化多模态LLM的时序推理能力,解决异常检测问题。 reinforcement learning large language model multimodal
2 Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards 提出上下文Rollout Bandits方法,提升可验证奖励强化学习的数学推理能力。 reinforcement learning large language model
3 Dreaming in Code for Curriculum Learning in Open-Ended Worlds DiCode:利用代码生成环境进行课程学习,提升开放世界智能体能力 curriculum learning foundation model
4 Bayesian Preference Learning for Test-Time Steerable Reward Models 提出Variational In-Context Reward Modeling (ICRM),实现测试时可控的奖励模型。 reinforcement learning preference learning
5 StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors StealthRL:一种基于强化学习的AI文本检测器对抗性复述攻击方法 reinforcement learning
6 Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems Dr. MAS:针对多智能体LLM系统的稳定强化学习训练框架 reinforcement learning
7 LLaDA2.1: Speeding Up Text Diffusion via Token Editing LLaDA2.1:通过Token编辑加速文本扩散模型推理,兼顾速度与质量。 reinforcement learning instruction following
8 When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems 提出多智能体强化学习以提升大语言模型的学习效率 reinforcement learning large language model
9 Learning in Context, Guided by Choice: A Reward-Free Paradigm for Reinforcement Learning with Transformers 提出基于偏好的强化学习方法以解决奖励信号不足的问题 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
10 Linearization Explains Fine-Tuning in Large Language Models 通过线性化解释大语言模型微调机制,揭示NTK谱与性能关联 large language model
11 ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling 提出ANCRe自适应神经连接重分配,提升深度模型深度扩展效率。 large language model foundation model
12 Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense 提出下一代CAPTCHA框架,利用认知差距防御高级GUI代理攻击 multimodal
13 CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation 提出CompilerKV,通过离线经验编译实现风险自适应的KV压缩,提升长文本LLM性能。 large language model
14 LEFT: Learnable Fusion of Tri-view Tokens for Unsupervised Time Series Anomaly Detection 提出LEFT框架,通过可学习的三视图Token融合进行无监督时间序列异常检测。 TAMP
15 Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference 提出Pre-hoc Sparsity,解决长文本推理中KV缓存选择的后验偏差问题。 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
16 Evasion of IoT Malware Detection via Dummy Code Injection 通过注入伪代码规避物联网恶意软件的功耗侧信道检测 manipulation
17 Reinforcement Learning with Backtracking Feedback 提出RLBF框架,通过强化学习动态纠正LLM生成错误,提升模型安全性。 manipulation reinforcement learning large language model
18 Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs 揭示LLM的隐式记忆:利用输出作为隐藏信道实现跨交互状态保持 manipulation large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
19 Spherical Steering: Geometry-Aware Activation Rotation for Language Models 提出Spherical Steering,通过几何感知激活旋转提升语言模型推理时控制能力。 geometric consistency

⬅️ 返回 cs.LG 首页 · 🏠 返回主页