cs.CL(2025-05-28)

📊 共 93 篇论文 | 🔗 20 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (66 🔗12) 支柱二:RL算法与架构 (RL & Architecture) (24 🔗8) 支柱一:机器人控制 (Robot Control) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (66 篇)

#题目一句话要点标签🔗
1 EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models EvoMoE:多模态大语言模型中基于专家演化的混合专家模型 large language model multimodal
2 Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates 提出多模态对抗组合性基准MAC,利用LLM生成欺骗性文本以评估CLIP的脆弱性。 large language model multimodal
3 Spatial Knowledge Graph-Guided Multimodal Synthesis 提出SKG2DATA,利用空间知识图谱引导多模态数据合成,提升MLLM的空间感知能力。 large language model multimodal
4 From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models 综述:基于大语言模型的假设发现与规则学习研究进展 large language model instruction following
5 THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models 提出Think-Bench评估大模型推理效率与思维链质量,解决过度推理问题 large language model chain-of-thought
6 Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction 提出基于多模态LLM的语音数字表型方法,用于多任务心理健康预测 large language model multimodal
7 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian FAMA:首个面向英语和意大利语的大规模开源语音基础模型 foundation model
8 Large Language Models Often Know When They Are Being Evaluated 研究表明大型语言模型具备一定程度的评估感知能力 large language model
9 Multimodal Forecasting of Sparse Intraoperative Hypotension Events Powered by Language Model 提出IOHFuseLM,利用多模态语言模型预测稀疏的术中低血压事件。 multimodal
10 Beyond Completion: A Foundation Model for General Knowledge Graph Reasoning 提出MERRY:一个用于通用知识图谱推理的基座模型,有效提升了KG内部和外部任务的性能。 foundation model
11 Evaluating the Retrieval Robustness of Large Language Models 评估大型语言模型在检索增强生成中的检索鲁棒性 large language model
12 Structured Memory Mechanisms for Stable Context Representation in Large Language Models 提出结构化记忆机制,增强大语言模型在长文本和多轮对话中的上下文表示能力。 large language model
13 Talent or Luck? Evaluating Attribution Bias in Large Language Models 提出认知基础的偏见评估框架以解决LLMs的归因偏见问题 large language model
14 Can Large Language Models Match the Conclusions of Systematic Reviews? MedEvidence基准测试揭示大型语言模型在系统评价结论匹配方面与临床专家存在差距 large language model
15 Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese 揭示大语言模型在简体中文和繁体中文上的偏差,并构建开源评测基准。 large language model
16 Precise In-Parameter Concept Erasure in Large Language Models PISCES:通过参数空间精确编辑,从大语言模型中擦除特定概念。 large language model
17 Do Large Language Models Think Like the Brain? Sentence-Level Evidences from Layer-Wise Embeddings and fMRI 通过层级嵌入和fMRI,研究大型语言模型与人脑的句子级神经机制相似性 large language model
18 Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities 综述:探索基于大型语言模型的Text-to-SQL技术进展、挑战与机遇 large language model
19 Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition 揭示LLM语音识别评估漏洞:LibriSpeech和Common Voice数据集污染 large language model
20 Say What You Mean: Natural Language Access Control with Large Language Models for Internet of Things LACE:利用大语言模型实现物联网自然语言访问控制 large language model
21 ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage 提出ICH-Qwen:面向中国非物质文化遗产的大语言模型 large language model
22 LoKI: Low-damage Knowledge Implanting of Large Language Models LoKI:一种低损的大语言模型知识植入方法,有效缓解灾难性遗忘。 large language model
23 Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists 提出HiTEC框架,通过分层错误检查列表提升大语言模型工具学习能力 large language model
24 MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models MemOS:为大语言模型设计内存增强生成(MAG)的操作系统 large language model
25 BiasFilter: An Inference-Time Debiasing Framework for Large Language Models BiasFilter:一种用于大型语言模型的推理时去偏框架 large language model
26 Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset PEARL:一个大规模、文化感知的阿拉伯语多模态指令数据集,用于提升LVLM的文化理解能力。 multimodal
27 Learning Composable Chains-of-Thought 提出可组合思维链学习方法,提升LLM在复杂推理任务上的泛化能力 large language model chain-of-thought
28 Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing ToxEdit:通过毒性感知知识编辑保障LLM的通用能力 large language model instruction following
29 Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective large language model chain-of-thought
30 ArgInstruct: Specialized Instruction Fine-Tuning for Computational Argumentation ArgInstruct:面向计算论证的专用指令微调方法 large language model instruction following
31 Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions 提出Chain-of-Talkers (CoTalk),加速密集图像描述的人工标注,提升标注质量。 multimodal
32 Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging 提出正交子空间模型合并(OSRM),解决LoRA模型合并时的性能退化问题。 large language model
33 Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning 提出Self-Error-Instruct框架,通过错误泛化提升LLM数学推理能力 large language model
34 Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks 揭示LLM多轮不完全信息横向推理任务中评估幻觉问题,并提出改进方案 large language model
35 If Pigs Could Fly... Can LLMs Logically Reason Through Counterfactuals? CounterLogic数据集揭示LLM在反事实推理中逻辑能力下降,并提出Self-Segregate方法显著提升性能。 large language model
36 Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models JQL:一种基于语言模型的多语言预训练数据高效过滤方法 large language model
37 Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design 提出层级推测解码框架,解决量化模型中推测解码计算开销过大的问题 large language model
38 DeepRTL2: A Versatile Model for RTL-Related Tasks DeepRTL2:用于RTL相关任务的多功能大型语言模型 large language model
39 Curse of High Dimensionality Issue in Transformer for Long-context Modeling 提出动态分组注意力(DGA)以解决Transformer长文本建模中的高维诅咒问题 large language model
40 Knowledge Base Construction for Knowledge-Augmented Text-to-SQL 构建知识库增强Text-to-SQL,提升LLM在领域数据库上的查询精度 large language model
41 OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature OWL数据集揭示LLM在世界文学中跨语言记忆能力,即使对低资源语言也有效。 large language model
42 What Has Been Lost with Synthetic Evaluation? 评估LLM生成基准的有效性:揭示合成评估中信息损失 large language model
43 First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay 提出“窃听代理”范式,利用多模态LLM辅助人类对话,以龙与地下城游戏为例。 multimodal
44 Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems 通过整合自动反馈系统的标注,提升自动作文评分的准确性 large language model
45 ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM ClaimPKG:利用轻量级专用LLM生成伪子图,增强声明验证能力 large language model
46 Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development 提出Co-Saving,一种资源感知的多智能体协作软件开发框架,提升效率和代码质量。 large language model
47 GateNLP at SemEval-2025 Task 10: Hierarchical Three-Step Prompting for Multilingual Narrative Classification 提出分层三步提示(H3Prompt)方法,用于多语言叙事分类,并在SemEval-2025任务中取得领先。 large language model
48 Self-Critique and Refinement for Faithful Natural Language Explanations 提出SR-NLE框架,通过自批判与改进提升LLM自然语言解释的忠实性 large language model
49 GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning GuessArena:提出自适应评估框架,用于评估LLM在特定领域知识和推理能力 large language model
50 Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs LLM中的随机变色龙现象:无关上下文诱导的幻觉揭示了基于类别的(误)泛化 large language model
51 Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Fast-dLLM:通过KV缓存和并行解码加速Diffusion LLM的训练,无需额外训练。 large language model
52 Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts 提出LayerMoE:一种基于层级混合专家模型的LLM高效多语言扩展方法 large language model
53 Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation 提出一种灵活的多LLM集成框架,实现可扩展的知识聚合 large language model
54 Fair Document Valuation in LLM Summaries via Shapley Values 提出基于Shapley值的Cluster Shapley算法,用于LLM摘要中文档贡献的公平评估。 large language model
55 SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context 提出SkewRoute,一种免训练的LLM路由方法,用于知识图谱RAG,通过检索上下文的分数偏度。 large language model
56 Measuring Sycophancy of Language Models in Multi-turn Dialogues 提出SYCON Bench,用于评估多轮对话中语言模型的谄媚行为 large language model
57 Advancing Expert Specialization for Better MoE 提出正交性和方差损失,提升MoE模型专家特化能力 large language model
58 InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing InComeS:通过压缩与选择机制增强LLM,实现高效的模型编辑 large language model
59 ChatCFD: An LLM-Driven Agent for End-to-End CFD Automation with Domain-Specific Structured Reasoning ChatCFD:一个基于LLM的端到端CFD自动化Agent,具备领域特定结构化推理能力 large language model
60 Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home? 提出基于相似度的MIA检测框架,保护RAG系统中检索数据的隐私 large language model
61 Reviewing Scientific Papers for Critical Problems With Reasoning LLMs: Baseline Approaches and Automatic Evaluation 利用推理LLM评估科学论文质量:基线方法与自动评估框架 large language model
62 Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate 提出一种基于翻译和评估的框架,用于衡量多语言LLM在不同语言间的一致性。 large language model
63 Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data 利用访谈信息增强的大语言模型模拟调查问卷回复,对比分析AI生成数据与人类数据。 large language model
64 Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack 揭示视觉-语言模型对抗攻击脆弱性,提出双阶段评估框架与安全对齐规范。 multimodal
65 EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse EFIM:通过改进KV缓存复用,高效服务于LLM的文本填充任务 large language model
66 Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries 提出基于原则性内容选择的多文档摘要方法,提升多样性和个性化。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (24 篇)

#题目一句话要点标签🔗
67 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start 提出基于冷启动的强化学习方法,提升多模态大语言模型的推理能力 reinforcement learning large language model multimodal
68 AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models 提出AutoL2S框架,通过自适应长短推理提升大语言模型效率。 distillation large language model chain-of-thought
69 Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge 提出一种融合心理学知识的多模态抑郁症检测方法,利用大语言模型提升识别精度。 MAE large language model multimodal
70 Reverse Preference Optimization for Complex Instruction Following 提出反向偏好优化(RPO)方法,提升LLM在复杂指令跟随任务中的性能 DPO large language model instruction following
71 Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models 提出一种基于logit抑制的策略,无需训练即可降低RLHF对齐语言模型在敏感内容上的拒绝率。 RLHF distillation large language model
72 Multi-MLLM Knowledge Distillation for Out-of-Context News Detection 提出多模态大语言模型知识蒸馏以解决低资源环境下的新闻检测问题 DPO distillation large language model
73 Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data 利用DPO和RLHF提升释义类型生成,并用人工排序数据评估 RLHF DPO direct preference optimization
74 The Price of a Second Thought: On the Evaluation of Reasoning Efficiency in Large Language Models 提出COTHINK:一种提升大语言模型推理效率的框架,降低计算成本。 reinforcement learning large language model
75 Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs Emotion-o1:自适应长推理提升LLM的情感理解能力 reinforcement learning HuMoR large language model
76 Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R$^2$)GRPO 提出MimicSFT和R²GRPO,提升LLM在科学信息抽取中的推理能力 reinforcement learning distillation large language model
77 RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding 提出RAD:通过自推测解码实现混合模型冗余感知蒸馏 SSM state space model distillation
78 WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning WorkForceAgent-R1:通过强化学习提升LLM网页Agent在企业环境中的推理能力 reinforcement learning large language model
79 LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents LaMDAgent:利用LLM Agent自主优化后训练流程,提升模型性能。 preference learning large language model instruction following
80 Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection 提出ORION框架,通过误差感知自反思增强长链推理蒸馏效果 distillation large language model
81 Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning 提出R1-Router,通过学习跨知识库查询路由实现多模态检索增强推理。 reinforcement learning large language model multimodal
82 Jailbreak Distillation: Renewable Safety Benchmarking 提出Jailbreak Distillation框架,用于构建可更新的大语言模型安全基准 distillation large language model
83 Text2Grad: Reinforcement Learning from Natural Language Feedback Text2Grad:利用自然语言反馈进行强化学习,实现细粒度梯度更新。 reinforcement learning RLHF
84 Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition 盘古Embedded:一种具备元认知的高效双系统LLM推理器 reinforcement learning distillation large language model
85 First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training 提出MM-UPT框架,通过无监督后训练持续提升多模态LLM的推理能力。 reinforcement learning large language model
86 VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning 提出VRAG-RL以解决视觉丰富信息理解中的推理挑战 reinforcement learning
87 Training Language Models to Generate Quality Code with Program Analysis Feedback REAL框架:利用程序分析反馈训练语言模型生成高质量代码 reinforcement learning large language model
88 Latent Reasoning via Sentence Embedding Prediction 提出基于句子嵌入预测的潜在推理框架,提升语言模型的抽象推理能力。 representation learning chain-of-thought
89 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason 研究奖励噪声对LLM推理能力的影响,提出基于推理模式奖励的校准方法。 reinforcement learning large language model
90 ValueSim: Generating Backstories to Model Individual Value Systems ValueSim:通过生成背景故事来建模个体价值体系,提升LLM的价值观对齐。 reinforcement learning large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
91 ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark 提出ASyMOB代数符号数学运算基准,评估LLM的符号运算能力及泛化性 manipulation large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
92 Conversational Alignment with Artificial Intelligence in Context 提出CONTEXT-ALIGN框架,评估AI对话智能体与人类沟通规范的对齐程度。 affordance large language model

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
93 CoMaPOI: A Collaborative Multi-Agent Framework for Next POI Prediction Bridging the Gap Between Trajectory and Language CoMaPOI:协同多智能体框架弥合轨迹与语言,提升下一地点预测精度 spatiotemporal large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页