cs.CL(2026-05-25)

📊 共 36 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (24) 支柱二:RL算法与架构 (RL & Architecture) (11 🔗4) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (24 篇)

#题目一句话要点标签🔗
1 Automated Benchmark Auditing for AI Agents and Large Language Models 提出Auto Benchmark Audit (ABA)框架,自动审计AI基准测试集并提升评估质量。 large language model
2 Creative Quality Alignment: Expert Tacit Knowledge Transfer via Chain-of-Thought Fine-Tuning 通过思维链微调传递专家隐性知识,实现创造性质量对齐 chain-of-thought
3 Toward a Benchmark for Controllable Simulation of Imperfect Students with Large Language Models 提出可控学习者模拟基准,利用大语言模型模拟具备特定技能缺陷的学生,用于教师培训。 large language model
4 The Age of Curiosity Meets the Age of AI: Benchmarking Child Safety in Large Language Models KIDBench:评估大语言模型在儿童安全方面的基准测试与安全模型。 large language model
5 A general tensor-structured compression scheme for efficient large language models 提出MixT:一种通用的张量结构压缩方案,用于高效压缩大型语言模型。 large language model
6 MATO: Multi-objective Personalized Alignment with Test-time Optimization for Large Language Models 提出MATO:一种基于测试时优化的多目标个性化对齐大语言模型框架 large language model
7 When Do LLM Agents Treat Surface Noise Differently from Semantic Noise? A 68-Cell Measurement Study with a Held-Out Trace-Level Validation 研究表明LLM Agent对语义噪声比表面噪声更敏感,并揭示了潜在的推理分歧机制。 large language model chain-of-thought
8 Double Triangle Annotation: A Scalable Human-in-the-Loop Framework for High-Precision Historical Document Annotation 提出双三角标注框架,利用多模态大模型共识,实现历史文档高精度标注。 large language model multimodal
9 WhoSaidIt: Human-LLM Collaborative Annotation for Text-Based Multilingual Speaker-Attribute Classification 提出基于人-LLM协作的文本多语种说话人属性分类标注框架WhoSaidIt large language model
10 QUIET: A Multi-Blank Cascaded Story Cloze Benchmark for LLM Creative Generation Capability QUIET:多空级联故事完形填空基准,用于评估LLM的创造性生成能力 large language model
11 Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization 针对阿拉伯语音标恢复,提出基于正则化微调的CATT-Whisper模型。 multimodal
12 Causal Tongue-Tie: LLMs Can Encode Causal Direction, But Their Yes/No Outputs Fail to Express 揭示大语言模型因果推理的“舌尖效应”:内部理解与外部表达不一致 large language model
13 TIAR: Trajectory-Informed Advantage Reweighting for LLM Abstention Learning TIAR:轨迹信息优势重加权用于LLM拒绝学习,提升模型可靠性 large language model
14 Clarify, Abstain or Answer? Strategising in Conversation with Belief-Augmented Generation 提出信念增强生成(BAG),提升LLM在对话式问答中澄清、回答或拒绝的能力。 large language model
15 StreamProfileBench: A Benchmark for Fine-Grained User Profile Inference in Real-World Streaming Scenarios StreamProfileBench:提出大规模流式用户画像基准,解决实时场景下用户兴趣演变建模难题 large language model
16 PowLU: An Activation Function for Stable Pre-Training of LLMs 提出PowLU激活函数,解决LLM预训练中的数值稳定性问题 large language model
17 Neural Router: Semantic Content Matching for Agentic AI 提出神经路由器,利用LLM进行语义内容匹配,赋能Agentic AI。 large language model
18 PennySynth: RAG-Driven Data Synthesis for Automated Quantum Code Generation PennySynth:基于RAG的量子代码自动生成数据合成框架 large language model
19 IndexMem: Learned KV-Cache Eviction with Latent Memory for Long-Context LLM Inference IndexMem:利用潜在记忆学习KV缓存淘汰策略,提升长文本LLM推理性能 large language model
20 HyLaT: Efficient Multi-Agent Communication via Hybrid Latent-Text Protocol HyLaT:提出一种混合隐-文本协议,用于提升多智能体通信效率。 large language model
21 SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models SomaliBench Eval评估揭示开放权重语言模型在索马里语拒绝回答方面存在显著差距 large language model
22 LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers 评估LLM作为审稿人的能力:偏差、差异性与提示注入抵抗力基准研究 large language model
23 EfficientGraph-RAG: Structured Retrieval-State Management for Cross-Task Retrieval-Augmented Generation EfficientGraph-RAG:通过结构化检索状态管理提升跨任务RAG效率 large language model
24 Tool-Call Dependency Structure is Linearly Decodable in LLM Agent Residual Streams 通过探针解码LLM Agent运行时工具调用依赖关系,揭示其线性可解码结构 chain-of-thought

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
25 BC Protocol: Structured Dual-Expert Dialogue for Eliciting High-Quality Chain-of-Thought Post-Training Data 提出BC协议,通过结构化双专家对话生成高质量思维链后训练数据 RLHF large language model chain-of-thought
26 Can LLMs Time Travel? Enhancing Temporal Consistency in Legal Agentic Search through Reinforcement Learning 提出LegalSearch-R1框架,通过强化学习提升法律Agent在时序一致性上的表现 reinforcement learning large language model
27 Selective Latent Thinking: Adaptive Compression of LLM Reasoning Chains 提出选择性潜在思维(SLT),自适应压缩LLM推理链以提升效率。 reinforcement learning large language model chain-of-thought
28 DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning DVAO:动态方差自适应优势优化,提升多奖励强化学习的稳定性和性能 reinforcement learning large language model
29 GeoSVG-RL: Geometry-Aware Reinforcement Learning for Layout-Constrained Text-to-SVG Diagram Generation GeoSVG-RL:针对布局约束的文本到SVG图表生成,提出几何感知强化学习框架。 reinforcement learning large language model
30 Language Models Need Sleep 提出睡眠机制,解决Transformer在长序列任务中的计算瓶颈 SSM large language model
31 SafeCtrl-RL: Inference-Time Adaptive Behaviour Control for LLM Dialogue via RL-Driven Prompt Optimisation SafeCtrl-RL:通过强化学习驱动的提示优化,实现LLM对话的推理时自适应行为控制 reinforcement learning large language model
32 Reinforcement Learning from Denoising Feedback 提出基于去噪反馈的强化学习方法以解决策略损失估计问题 reinforcement learning
33 CRPO: Character-centric Group Relative Policy Optimization for Role-aware Reasoning in Role-playing Agents 提出CRPO,解决角色扮演Agent中角色一致性与风格坍塌问题 reinforcement learning large language model
34 Learning to Route Languages for Multilingual Policy Optimization 提出语言路由策略优化(LRPO),提升多语言策略优化中跨语言知识的利用效率。 reinforcement learning large language model
35 Peak-Then-Collapse and the Four Interface Channels of Knowledge-Graph Tool Use 揭示知识图谱工具使用中“峰值-崩溃”现象,并探究其接口通道的影响 distillation reward design

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
36 GeoMathCode: Understanding Interleaved Math-Code Reasoning for Geometry Problem Solving 提出GeoMathCode,利用程序代码作为几何问题求解的中间视觉表示。 manipulation large language model multimodal

⬅️ 返回 cs.CL 首页 · 🏠 返回主页