cs.CL(2026-05-29)

📊 共 48 篇论文 | 🔗 13 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (38 🔗10) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗3)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (38 篇)

#题目一句话要点标签🔗
1 BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali BenHalluEval:孟加拉语大语言模型幻觉评估多任务框架 large language model chain-of-thought
2 Skill Availability and Presentation Granularity in Large-Language-Model Agents: A Controlled SkillsBench Study 研究技能文档粒度对大语言模型Agent任务成功率的影响,发现技能可用性是关键因素 large language model
3 The Sword, Shield, and Achilles' Heel: Characterizing the Linguistic Inductive Bias of Large Language Models for Spatial Reasoning in Navigation Planning 提出双重干预框架,评估大语言模型在导航规划中空间推理的语言归纳偏置。 large language model
4 Target-Side Paraphrase Augmentation for Sign Language Translation with Large Language Models 提出基于GPT-4o目标端释义增强的Signformer手语翻译方法,提升低资源场景性能。 large language model
5 Do Large Language Models Encode Institutional Experience? Evidence from Cross-Linguistic Moral Reasoning Under Ambiguity 研究表明大型语言模型在跨语言道德推理中体现了制度经验的痕迹 large language model
6 Human-Alignment, Calibration, and Activation Patterns in Large Language Model Uncertainty 探索大语言模型不确定性与人类对齐、校准及激活模式的关联 large language model
7 Semantic Triplet Restoration: A Novel Protocol for Hierarchical Table Understanding in Large Language Models 提出语义三元组恢复协议,提升大语言模型在层级表格理解任务上的性能。 large language model
8 EvoDefense: Co-Evolving Black-Box Defense with Large Language Models EvoDefense:一种基于大语言模型的协同进化黑盒防御方法 large language model
9 TeachObs: A Human-Validated Benchmark for Multimodal Teaching Observation and Model Evaluation 提出TeachObs:一个用于多模态教学观察和模型评估的人工验证基准 multimodal
10 DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs 提出DOA:一种免训练的解码器自注意力策略,用于SpeechLLM的长文本同步翻译 large language model multimodal
11 What Am I Missing? Question-Answering as Hidden State Probing 提出基于问题生成的隐状态探测方法,用于提升LLM的推理能力。 large language model chain-of-thought
12 MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft MineExplorer:评估MLLM智能体在Minecraft开放世界中的探索能力 large language model multimodal
13 MADS: Model-Aware Diverse Core Set Selection for Instruction Tuning 提出模型感知的多样核心集选择方法以解决指令微调数据选择问题 large language model instruction following
14 FBHM: Functional Benchmarking and Steering of VLMs for Hateful Meme Detection 提出FBHM基准测试与LSV引导方法,提升VLM在仇恨模因检测中的泛化能力。 multimodal
15 Scaling Multi-Hop Training Data via Graph-Constrained Path Selection 提出图约束路径选择以扩展多跳训练数据 large language model
16 Shared Doubt: Zero-shot Cross-Lingual Confidence Estimation for Language Models 提出一种零样本跨语言置信度估计方法,利用多语言LLM的共享置信度特征。 large language model
17 Steering LLMs? Actually, Sparse Autoencoders can outperform simple baselines 通过监督式特征选择,稀疏自编码器在LLM引导任务上可媲美LoRA large language model
18 XLGoBench: Detecting cross-lingual skill gaps with algorithmic tasks XLGoBench:提出算法任务集以检测大语言模型跨语言能力差距 large language model
19 If LLMs Have Human-Like Attributes, Then So Does Age of Empires II 质疑LLM拟人化属性:在《帝国时代II》中亦可观察到类似现象 large language model
20 D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training 提出D$^3$框架,通过动态方向图约束优化LLM训练数据调度,提升学习效率。 large language model
21 Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits 研究表明:提示语中的毒性词汇会降低大语言模型的可靠性,并揭示了内部计算的变化。 large language model
22 Fine-Tuning Improves Information Conveyance in Language Models 提出Canopy Entropy以解决语言模型信息传递效率问题 large language model
23 Language Models Can Resolve Reference Compositionally, But It's Not Their Native Strength: The Case of the Personal Relation Task 揭示大语言模型在指称组合性理解上的局限性:以人际关系任务为例 large language model
24 Knowledge Boundary Probing and Demand-Guided Intervention for LLM-Based Power System Code Generation 提出PowerCodeBench与知识边界干预方法,提升LLM在电力系统代码生成中的可靠性。 large language model
25 LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories 揭示LLM作为安全评估者的不一致性,尤其在金融等受监管领域 large language model
26 The Latin Substrate: How Language Models Represent and Mediate Script Choice 揭示LLM中拉丁语底层偏好:探究语言模型如何表征和调解文字选择 large language model
27 Divergence Decoding: Inference-Time Unlearning via Auxiliary Models 提出Divergence Decoding,通过辅助模型实现LLM的推理时非学习,解决隐私和版权风险。 large language model
28 Wind Turbine Maintenance Log Labelling Framework: LLM-Driven Data Correction and Enrichment via Semantic Extraction of Reliability Intelligence 提出基于LLM的风力涡轮机维护日志标注框架,实现数据校正与可靠性信息提取。 large language model
29 Multilingual and Cross-Lingual Citation Needed Detection on Wikipedia for Lower-Resource Languages 提出MCN多语言语料库,利用小型语言模型解决低资源语言维基百科的Citation Needed检测问题 large language model
30 GRKV: Global Regression for Training-Free KV Cache Compression in Long-Context LLMs GRKV:通过全局回归实现长文本LLM中免训练的KV缓存压缩 large language model
31 Beyond Static Dialogues: Benchmarking Realistic, Heterogeneous, and Evolving Long-Term Memory 提出RHELM基准,评估LLM在真实异构演化长期记忆场景下的性能 large language model
32 How Much Do LLMs Know About Chinese Zero Pronouns? 系统性评估大型语言模型对中文零代词的理解能力 large language model
33 MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation 提出MoG:基于图的检索增强生成混合专家模型,提升复杂推理性能。 large language model
34 EvoGens: A Population-Based Heuristic Search Framework for Scientific Idea Generation EvoGens:一种基于种群的启发式搜索框架,用于科学思想生成。 large language model
35 dMoE: dLLMs with Learnable Block Experts dMoE:提出可学习块专家机制,解决扩散语言模型中专家选择与块并行解码的失配问题。 large language model
36 Incremental BPE Tokenization 提出增量BPE分词算法以提升流式处理效率 large language model
37 Efficient Diffusion LLMs via Temporal-Spatial Parallel Decoding and Confidence Extrapolation 提出时空并行解码与置信度外推方法,加速扩散语言模型的推理。 large language model
38 Triaging Threats to Specialized Guardrails 提出RouteGuard:一种基于路由-专家框架的专业化安全防护方案 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
39 AdaptR1: Reinforcement Learning Based Adaptive Interleaved Thinking in Multi-hop Question Answering AdaptR1:基于强化学习的多跳问答自适应交错推理 reinforcement learning large language model chain-of-thought
40 PatchWorld: Gradient-Free Optimization of Executable World Models PatchWorld:通过无梯度优化可执行世界模型,提升文本智能体环境中的规划能力 world model world models
41 LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards 提出LongTraceRL,利用搜索轨迹和规则奖励学习长文本推理,提升LLM在复杂上下文中的信息整合能力。 reinforcement learning reward design large language model
42 Are Full Rollouts Necessary for On-Policy Distillation? 提出渐进式和截断式策略,提升On-policy蒸馏在长序列推理中的训练效率。 reinforcement learning distillation
43 Your Teacher Can't Help You Here: Combating Supervision Fidelity Decay in On-Policy Distillation 提出Lookahead Group Reward以解决监督信度衰减问题 distillation
44 Preference-Aware Rubric Learning for Personalized Evaluation 提出PARL框架,通过偏好感知的准则学习实现个性化评估。 reinforcement learning large language model
45 Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards 研究表明:强化学习会放大良性奖励带来的涌现性不对齐问题 reinforcement learning
46 ConsisGuard: Aligning Safety Deliberation with Policy Enforcement in LLM Guardrails ConsisGuard:对齐LLM Guardrails中的安全推理与策略执行,提升安全可靠性 distillation chain-of-thought
47 Combinatorial Synthesis: Scaling Code RLVR via Atomic Decomposition and Recombination 提出原子分解与重组(ADR)框架,提升代码RLVR的可扩展性 reinforcement learning large language model
48 The Flip Side of RLHF: On-Policy Feedback for Reward Model Self-Supervised Improvement 提出SAVE框架,利用策略价值函数自监督提升奖励模型,解决奖励模型训练数据瓶颈问题。 RLHF

⬅️ 返回 cs.CL 首页 · 🏠 返回主页