cs.CL(2026-05-26)

📊 共 43 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (29 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (29 篇)

#题目一句话要点标签🔗
1 QUACK: Questioning, Understanding, and Auditing Communicated Knowledge in Multimodal Social Deduction Agents QUACK:多模态社交推理Agent中知识沟通的质询、理解与审计 large language model multimodal
2 Beyond Questions: Evaluating What Large Language Models (Actually) Know 提出开放知识评估框架BeQu,用于全面评估大语言模型所掌握的知识。 large language model
3 The Labyrinth and the Thread: Rethinking Regularizations in Sequential Knowledge Editing for Large Language Models 重新审视大语言模型序列知识编辑中的正则化方法,简化并提升编辑稳定性。 large language model
4 AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian 提出AlbanianLLMSafety,首个阿尔巴尼亚语LLM安全评估数据集,促进低资源语言LLM安全。 large language model
5 KZ-SafetyPrompts: A Kazakh Safety Evaluation Prompt Dataset for Large Language Models 提出KZ-SafetyPrompts:一个用于评估大型语言模型安全性的哈萨克语提示数据集。 large language model
6 Rethinking the Multilingual Reasoning Gap with Layer Swap 提出Layer Swap方法,提升多语言大模型在非英语环境下的推理能力。 large language model chain-of-thought
7 Tracing Computation Density in LLMs 提出s-Trace方法,揭示LLM计算密度分布规律与模块化组织结构。 large language model
8 Evaluating the Relevance of Uncertainty Estimators for LLM Hallucination 评估不确定性估计器在LLM幻觉检测中的相关性 large language model
9 Why Prompt Optimization Works, and Why It Sometimes Doesn't: A Causal-Inspired Edit-Level Analysis 通过因果分析编辑级别,揭示提示优化有效与失效的原因 large language model
10 Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS) 提出词覆盖率评分(WCS),评估LLM采样策略对词汇丰富度的影响 large language model
11 JuICE: A Benchmark for Evaluating LLM-Judge in Identifying Cultural Errors JuICE:一个评估LLM在识别文化错误方面能力的基准 large language model
12 ContextGuard: Structured Self-Auditing for Context Learning in Language Models ContextGuard:一种结构化自审计方法,用于提升语言模型在上下文学习中的表现 large language model
13 Annotator Positionality as Signal: Psychometric Weighting for Anti-Autistic Ableism Detection 提出基于标注者立场的心理测量加权框架,用于检测反自闭症歧视言论。 large language model
14 Real Images, Worse Judgments: Evaluating Vision-Language Models on Concreteness and Imagery 研究表明:视觉-语言模型在词汇判断中易受图像背景干扰,降低与人类判断的一致性 multimodal
15 ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents 提出ENPMR-Bench基准,评估情感支持对话中主动记忆检索能力 chain-of-thought
16 Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora 分析时间同步性对情感语料标注质量的影响,并构建Setswana语料库。 TAMP
17 BAIT: Boundary-Guided Disclosure Escalation via Self-Conditioned Reasoning BAIT:通过自条件推理和边界引导实现大语言模型的越狱攻击 large language model
18 On the Hidden Costs of Counterfactual Knowledge Training in LLM Unlearning 揭示并缓解LLM反事实知识训练中隐藏的知识冲突与幻觉蔓延问题 large language model
19 Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling 提出协同并行思考(CPT)框架,提升大语言模型测试时推理效率。 large language model
20 PersLitEval: Fine-grained Benchmark and Evaluation of LLMs on Persian Literature Questions PersLitEval:构建波斯文学细粒度评测基准,评估大型语言模型性能 large language model
21 Prompt Injection Detection is Regime-Dependent: A Deployment-Aware Evaluation with Interpretable Structural Signals 针对提示注入攻击,提出部署感知的评估框架与可解释结构信号检测方法。 large language model
22 Accountable Human-AI Deliberation with LLMs: Scaling Collective Intelligence through Symbiotic Scaffolding 提出一种基于LLM的共生式人机协同框架,以提升集体智慧并保障责任归属。 large language model
23 Quality Without Usefulness: LLM-Generated XAI Narratives as Trust Heuristics Rather Than Decision Aids LLM生成的可解释AI叙事未能提升决策效用,反成信任启发式 large language model
24 LATTE: Forecasting Peer Anchored Preference Trajectories for Personalized LLM Generation LATTE:预测对等锚定的偏好轨迹,实现个性化LLM生成 large language model
25 Verilog-Evolve: Feedback-Driven and Skill-Evolving Verilog Generation Verilog-Evolve:一种反馈驱动和技能演进的Verilog生成框架 large language model
26 Model Unlearning Objectives Vary for Distinct Language Functions 针对不同语言功能,提出差异化的LLM模型遗忘目标,提升遗忘效果。 large language model
27 Probing Minimalist Phase Structure in LLMs: What Universal Dependencies Cannot Represent 通过探究LLM中的极简主义句法结构,揭示通用依存句法无法表示的信息 large language model
28 Slide Deck Q&A Quality Assurance App: A Multi-Stage Pipeline for Pedagogical Question Generation 提出Slide Deck Q&A Quality Assurance系统,用于从幻灯片生成高质量教学问题 large language model
29 Towards Just-in-Time Adaptive Feedback: Enhancing Student Learning via Knowledge-Grounded LLM 提出知识增强的LLM框架,实现即时自适应反馈,提升学生学习效果 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
30 GeoFaith: A Spatio-Temporal Dual View of Faithful Chain-of-Thought GeoFaith:提出时空双重视角的可信Chain-of-Thought推理框架 reinforcement learning large language model chain-of-thought
31 SeDT: Sentence-Transformer Decision-Transformer Conditioning for Multi-Turn Conversation Reliability 提出SeDT,通过句子Transformer决策Transformer条件反射提升多轮对话可靠性 reinforcement learning offline reinforcement learning decision transformer
32 Large Language Model-Powered Query-Driven Event Timeline Summarization in Industrial Search QDET:基于大语言模型的查询驱动事件时间线摘要系统,提升工业搜索效果 reinforcement learning large language model
33 Optimising Factual Consistency in Summarisation via Preference Learning from Multiple Imperfect Metrics 提出基于多重不完美指标偏好学习的摘要事实一致性优化方法 reinforcement learning preference learning reward shaping
34 Not All Tokens Matter Equally: Dynamic In-context Vector Distillation with Decisive-Token Supervision for Long-form Medical Report Generation DIVE:通过关键Token监督的动态上下文向量蒸馏,用于生成长篇医学报告 distillation multimodal
35 Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs 提出PIPO,通过联合隐空间压缩和多token预测加速LLM推理。 distillation large language model chain-of-thought
36 MAIGO: Mitigating Lost-in-Conversation with History-Cleaned On-Policy Self-Distillation 提出MAIGO,通过历史清理的On-Policy自蒸馏缓解对话过程中的信息丢失问题 distillation large language model
37 Tournament-GRPO: Group-Wise Tournament Rewards for Reinforcement Learning in Open-Ended Long-Form Generation 提出Tournament-GRPO,通过锦标赛奖励优化开放式长文本生成中的强化学习。 reinforcement learning reward design
38 EmoDistill: Offline Emotion Skill Distillation for Language Model Agents in Adversarial Negotiation 提出EmoDistill以解决情感驱动的对抗性谈判问题 IQL distillation
39 Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement 提出AKBE,通过动态探测知识边界,提升Agentic RL中LLM智能体的工具使用效率。 reinforcement learning reward shaping
40 It's Not Always Sycophancy: Measuring LLM Conformity as a Function of Epistemic Uncertainty MUSE框架揭示LLM顺从性受认知不确定性影响,不仅是谄媚 reinforcement learning large language model
41 LitSeg: Narrative-Aware Document Segmentation for Literary RAG 提出LitSeg,利用叙事理论进行文学作品RAG的文档分割,提升检索和生成效果。 distillation large language model
42 Learning to Adapt SFT Data for Better Reasoning Generalization 提出DART:通过数据自适应提升LLM推理泛化能力 reinforcement learning large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
43 ExTax: Explainable Disinformation Detection via Persuasion, Emotion, and Narrative Role Taxonomies ExTax:提出一种基于说服、情感和叙事角色分类法的可解释虚假信息检测框架 manipulation

⬅️ 返回 cs.CL 首页 · 🏠 返回主页