cs.CL(2026-02-11)
📊 共 24 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | How Do Decoder-Only LLMs Perceive Users? Rethinking Attention Masking for User Representation Learning | 提出梯度引导软掩码,提升Decoder-only LLM用户表征学习效果 | representation learning contrastive learning large language model | ✅ | |
| 18 | Safety Recovery in Reasoning Models Is Only a Few Early Steering Steps Away | SafeThink:通过早期引导步骤实现推理模型中的安全性恢复 | reinforcement learning multimodal chain-of-thought | ||
| 19 | DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning | DataChef:通过强化学习自动生成LLM适配的最佳数据配方 | reinforcement learning large language model | ||
| 20 | Reinforced Curriculum Pre-Alignment for Domain-Adaptive VLMs | 提出RCPA:强化课程预对齐方法,提升领域自适应视觉-语言模型性能 | reinforcement learning large language model multimodal | ||
| 21 | Neuro-Symbolic Synergy for Interactive World Modeling | 提出神经符号协同框架NeSyS,提升交互式世界建模的表达性和鲁棒性 | world model large language model | ||
| 22 | Deep Learning-based Method for Expressing Knowledge Boundary of Black-Box LLM | 提出LSCL,一种基于深度学习的黑盒LLM知识边界表达方法 | distillation large language model | ||
| 23 | Online Causal Kalman Filtering for Stable and Effective Policy Optimization | 提出在线因果卡尔曼滤波策略优化算法,解决LLM强化学习中不稳定的重要性采样问题 | reinforcement learning large language model | ||
| 24 | Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters | Step 3.5 Flash:以11B活跃参数实现前沿水平的智能体能力,兼顾推理与效率 | reinforcement learning IMoS |