cs.CL(2026-01-07)

📊 共 48 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (42 🔗6) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (42 篇)

#题目一句话要点标签🔗
1 Layer-Order Inversion: Rethinking Latent Multi-Hop Reasoning in Large Language Models 揭示大语言模型多跳推理中层序反转现象,提出概率式召回-抽取框架 large language model chain-of-thought
2 Mem-Gallery: Benchmarking Multimodal Long-Term Conversational Memory for MLLM Agents 提出Mem-Gallery基准,评估多模态LLM Agent的长期对话记忆能力 large language model multimodal
3 When Helpers Become Hazards: A Benchmark for Analyzing Multimodal LLM-Powered Safety in Daily Life 提出SaLAD:用于评估多模态LLM在日常生活中安全性的基准 large language model multimodal
4 DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs DiffCoT:利用扩散模型改进LLM中的思维链推理,提升鲁棒性和纠错能力 large language model chain-of-thought
5 ContextFocus: Activation Steering for Contextual Faithfulness in Large Language Models ContextFocus:一种激活引导方法,提升大语言模型中的上下文忠实度 large language model
6 Value-Action Alignment in Large Language Models under Privacy-Prosocial Conflict 提出基于上下文的评估协议,分析LLM在隐私与亲社会冲突下的价值-行为一致性 large language model
7 Bridging the Discrete-Continuous Gap: Unified Multimodal Generation via Coupled Manifold Discrete Absorbing Diffusion 提出CoM-DAD,通过耦合流形离散吸收扩散实现统一多模态生成。 multimodal
8 RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models RedBench:构建通用数据集,全面评估大型语言模型的对抗鲁棒性 large language model
9 PsychEthicsBench: Evaluating Large Language Models Against Australian Mental Health Ethics PsychEthicsBench:评估大型语言模型在澳大利亚心理健康伦理方面的表现 large language model
10 How Do Large Language Models Learn Concepts During Continual Pre-Training? 研究大型语言模型概念学习与遗忘机制 large language model
11 From Implicit to Explicit: Token-Efficient Logical Supervision for Mathematical Reasoning in LLMs 提出FSLR框架,通过显式逻辑监督提升LLM在数学推理中的token效率。 large language model chain-of-thought
12 Towards Compositional Generalization of LLMs via Skill Taxonomy Guided Data Synthesis 提出STEPS框架,通过技能分类引导的数据合成提升LLM的组合泛化能力 large language model instruction following
13 From Chains to Graphs: Self-Structured Reasoning for General-Domain LLMs 提出自结构图推理(SGR)框架,提升LLM在开放域问答中的推理一致性 large language model chain-of-thought
14 HearSay Benchmark: Do Audio LLMs Leak What They Hear? HearSay基准测试揭示音频大语言模型存在严重语音隐私泄露风险 large language model chain-of-thought
15 Do LLM Self-Explanations Help Users Predict Model Behavior? Evaluating Counterfactual Simulatability with Pragmatic Perturbations 评估LLM自解释在反事实模拟中的作用:基于语用扰动的用户行为预测 large language model chain-of-thought
16 Persona-aware and Explainable Bikeability Assessment: A Vision-Language Model Approach 提出Persona感知的可解释自行车友好度评估视觉-语言模型 chain-of-thought
17 NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models NeuronScope:一种用于解释语言模型中多义神经元的多智能体框架 large language model
18 AI Generated Text Detection 评估AI文本检测方法,提出基于主题划分的基准测试,提升模型泛化能力。 large language model
19 Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework 提出Cue-Resistant Memorization框架以评估LLMs的PII泄露问题 large language model
20 Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents Membox:通过编织主题连续性增强LLM Agent的长期记忆 large language model
21 Evaluation of Multilingual LLMs Personalized Text Generation Capabilities Targeting Groups and Social-Media Platforms 评估多语言LLM针对群体和社交媒体平台的个性化文本生成能力 large language model
22 ADEPT: Adaptive Dynamic Early-Exit Process for Transformers ADEPT:Transformer的自适应动态早退机制,提升推理效率。 large language model
23 Evaluating LLMs for Police Decision-Making: A Framework Based on Police Action Scenarios 提出PAS框架,评估LLM在警务决策中的应用,解决现有评估体系缺失问题 large language model
24 Reasoning Pattern Alignment Merging for Adaptive Reasoning 提出RPAM:一种基于特征对齐的模型融合框架,用于自适应推理。 chain-of-thought
25 Beyond Perplexity: A Lightweight Benchmark for Knowledge Retention in Supervised Fine-Tuning 提出KR-Test,用于评估SFT中LLM的知识保留能力,区分事实学习与语言模仿。 large language model
26 LLMberjack: Guided Trimming of Debate Trees for Multi-Party Conversation Creation LLMberjack:引导式辩论树修剪平台,用于创建多方对话 large language model
27 Modular Prompt Optimization: Optimizing Structured Prompts with Section-Local Textual Gradients 模块化Prompt优化(MPO):利用分段局部文本梯度优化结构化Prompt large language model
28 Simulated Students in Tutoring Dialogues: Substance or Illusion? 提出学生模拟任务评估框架,揭示LLM在辅导对话中模拟学生的局限性 large language model
29 Benchmark^2: Systematic Evaluation of LLM Benchmarks Benchmark^2框架:系统评估LLM基准测试质量,提升评估效率。 large language model
30 SoK: Privacy Risks and Mitigations in Retrieval-Augmented Generation Systems 系统性分析RAG系统中隐私风险与缓解措施,构建风险分类与流程图谱。 large language model
31 Decide Then Retrieve: A Training-Free Framework with Uncertainty-Guided Triggering and Dual-Path Retrieval 提出DTR框架,通过不确定性引导触发和双路检索,提升开放域问答中检索增强生成效果。 large language model
32 Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification 评估小型Decoder-Only语言模型在语法纠错和文本简化任务中的性能 large language model
33 Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning 提出ATLAS以解决异构模型与工具的动态选择问题 large language model
34 PartisanLens: A Multilingual Dataset of Hyperpartisan and Conspiratorial Immigration Narratives in European Media PartisanLens:构建欧洲媒体中多语种极端党派和阴谋论移民叙事数据集。 large language model
35 Whose Facts Win? LLM Source Preferences under Knowledge Conflicts 研究知识冲突下LLM对来源偏好,提出方法缓解重复偏见并保持偏好一致性。 large language model
36 SyncThink: A Training-Free Strategy to Align Inference Termination with Reasoning Saturation SyncThink:一种免训练策略,使推理终止与推理饱和度对齐,降低CoT开销。 chain-of-thought
37 ELO: Efficient Layer-Specific Optimization for Continual Pretraining of Multilingual LLMs ELO:面向多语言LLM持续预训练的高效层特异性优化方法 large language model
38 Agent-Dice: Disentangling Knowledge Updates via Geometric Consensus for Agent Continual Learning Agent-Dice:通过几何共识解耦知识更新,解决Agent持续学习中的灾难性遗忘问题 large language model
39 Reasoning Model Is Superior LLM-Judge, Yet Suffers from Biases 研究表明:推理模型作为LLM评判者更优,但仍存在偏见 instruction following
40 DiVA: Fine-grained Factuality Verification with Agentic-Discriminative Verifier 提出Agentic Discriminative Verifier (DiVA)用于细粒度的事实性验证。 large language model
41 EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory EvolMem:一个认知驱动的多轮对话记忆评估基准 large language model
42 DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing DeepSynth-Eval:提出客观评估深度调研写作中信息整合能力的基准 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
43 MIND: From Passive Mimicry to Active Reasoning through Capability-Aware Multi-Perspective CoT Distillation 提出MIND框架,通过能力感知的多视角CoT蒸馏实现主动推理,提升小模型的泛化能力。 distillation large language model chain-of-thought
44 O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL O-Researcher:通过多智能体蒸馏和Agentic RL实现开放域深度研究模型 reinforcement learning distillation large language model
45 Step Potential Advantage Estimation: Harnessing Intermediate Confidence and Correctness for Efficient Mathematical Reasoning 提出SPAE,利用中间置信度和正确性进行高效数学推理,提升奖励分配。 reinforcement learning large language model chain-of-thought
46 OLA: Output Language Alignment in Code-Switched LLM Interactions OLA:提出用于评估LLM在混合语境中输出语言对齐的基准测试。 DPO large language model chain-of-thought
47 NeoAMT: Neologism-Aware Agentic Machine Translation with Reinforcement Learning 提出NeoAMT:一种基于强化学习和Wiktionary的感知新词的Agentic机器翻译框架 reinforcement learning reward design
48 KDCM: Reducing Hallucination in LLMs through Explicit Reasoning Structures KDCM:通过显式推理结构减少大型语言模型中的幻觉 distillation large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页