cs.CL（2025-05-28）

📊 共 93 篇论文 | 🔗 20 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (66 🔗12) 支柱二：RL算法与架构 (RL & Architecture) (24 🔗8) 支柱一：机器人控制 (Robot Control) (1) 支柱三：空间感知与语义 (Perception & Semantics) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (66 篇)

#	题目	一句话要点	标签	🔗
1	EvoMoE: Expert Evolution in Mixture of Experts for Multimodal Large Language Models	EvoMoE：多模态大语言模型中基于专家演化的混合专家模型	large language model multimodal
2	Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates	提出多模态对抗组合性基准MAC，利用LLM生成欺骗性文本以评估CLIP的脆弱性。	large language model multimodal
3	Spatial Knowledge Graph-Guided Multimodal Synthesis	提出SKG2DATA，利用空间知识图谱引导多模态数据合成，提升MLLM的空间感知能力。	large language model multimodal	✅
4	From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models	综述：基于大语言模型的假设发现与规则学习研究进展	large language model instruction following
5	THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models	提出Think-Bench评估大模型推理效率与思维链质量，解决过度推理问题	large language model chain-of-thought
6	Speech as a Multimodal Digital Phenotype for Multi-Task LLM-based Mental Health Prediction	提出基于多模态LLM的语音数字表型方法，用于多任务心理健康预测	large language model multimodal
7	FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian	FAMA：首个面向英语和意大利语的大规模开源语音基础模型	foundation model
8	Large Language Models Often Know When They Are Being Evaluated	研究表明大型语言模型具备一定程度的评估感知能力	large language model
9	Multimodal Forecasting of Sparse Intraoperative Hypotension Events Powered by Language Model	提出IOHFuseLM，利用多模态语言模型预测稀疏的术中低血压事件。	multimodal	✅
10	Beyond Completion: A Foundation Model for General Knowledge Graph Reasoning	提出MERRY：一个用于通用知识图谱推理的基座模型，有效提升了KG内部和外部任务的性能。	foundation model
11	Evaluating the Retrieval Robustness of Large Language Models	评估大型语言模型在检索增强生成中的检索鲁棒性	large language model
12	Structured Memory Mechanisms for Stable Context Representation in Large Language Models	提出结构化记忆机制，增强大语言模型在长文本和多轮对话中的上下文表示能力。	large language model
13	Talent or Luck? Evaluating Attribution Bias in Large Language Models	提出认知基础的偏见评估框架以解决LLMs的归因偏见问题	large language model
14	Can Large Language Models Match the Conclusions of Systematic Reviews?	MedEvidence基准测试揭示大型语言模型在系统评价结论匹配方面与临床专家存在差距	large language model
15	Characterizing Bias: Benchmarking Large Language Models in Simplified versus Traditional Chinese	揭示大语言模型在简体中文和繁体中文上的偏差，并构建开源评测基准。	large language model	✅
16	Precise In-Parameter Concept Erasure in Large Language Models	PISCES：通过参数空间精确编辑，从大语言模型中擦除特定概念。	large language model
17	Do Large Language Models Think Like the Brain? Sentence-Level Evidences from Layer-Wise Embeddings and fMRI	通过层级嵌入和fMRI，研究大型语言模型与人脑的句子级神经机制相似性	large language model
18	Exploring the Landscape of Text-to-SQL with Large Language Models: Progresses, Challenges and Opportunities	综述：探索基于大型语言模型的Text-to-SQL技术进展、挑战与机遇	large language model
19	Evaluation of LLMs in Speech is Often Flawed: Test Set Contamination in Large Language Models for Speech Recognition	揭示LLM语音识别评估漏洞：LibriSpeech和Common Voice数据集污染	large language model
20	Say What You Mean: Natural Language Access Control with Large Language Models for Internet of Things	LACE：利用大语言模型实现物联网自然语言访问控制	large language model
21	ICH-Qwen: A Large Language Model Towards Chinese Intangible Cultural Heritage	提出ICH-Qwen：面向中国非物质文化遗产的大语言模型	large language model
22	LoKI: Low-damage Knowledge Implanting of Large Language Models	LoKI：一种低损的大语言模型知识植入方法，有效缓解灾难性遗忘。	large language model
23	Enhancing Tool Learning in Large Language Models with Hierarchical Error Checklists	提出HiTEC框架，通过分层错误检查列表提升大语言模型工具学习能力	large language model
24	MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models	MemOS：为大语言模型设计内存增强生成（MAG）的操作系统	large language model
25	BiasFilter: An Inference-Time Debiasing Framework for Large Language Models	BiasFilter：一种用于大型语言模型的推理时去偏框架	large language model
26	Pearl: A Multimodal Culturally-Aware Arabic Instruction Dataset	PEARL：一个大规模、文化感知的阿拉伯语多模态指令数据集，用于提升LVLM的文化理解能力。	multimodal
27	Learning Composable Chains-of-Thought	提出可组合思维链学习方法，提升LLM在复杂推理任务上的泛化能力	large language model chain-of-thought
28	Adaptive Detoxification: Safeguarding General Capabilities of LLMs through Toxicity-Aware Knowledge Editing	ToxEdit：通过毒性感知知识编辑保障LLM的通用能力	large language model instruction following
29	Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective		large language model chain-of-thought
30	ArgInstruct: Specialized Instruction Fine-Tuning for Computational Argumentation	ArgInstruct：面向计算论证的专用指令微调方法	large language model instruction following
31	Chain-of-Talkers (CoTalk): Fast Human Annotation of Dense Image Captions	提出Chain-of-Talkers (CoTalk)，加速密集图像描述的人工标注，提升标注质量。	multimodal
32	Unraveling LoRA Interference: Orthogonal Subspaces for Robust Model Merging	提出正交子空间模型合并(OSRM)，解决LoRA模型合并时的性能退化问题。	large language model
33	Self-Error-Instruct: Generalizing from Errors for LLMs Mathematical Reasoning	提出Self-Error-Instruct框架，通过错误泛化提升LLM数学推理能力	large language model
34	Evaluation Hallucination in Multi-Round Incomplete Information Lateral-Driven Reasoning Tasks	揭示LLM多轮不完全信息横向推理任务中评估幻觉问题，并提出改进方案	large language model
35	If Pigs Could Fly... Can LLMs Logically Reason Through Counterfactuals?	CounterLogic数据集揭示LLM在反事实推理中逻辑能力下降，并提出Self-Segregate方法显著提升性能。	large language model
36	Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models	JQL：一种基于语言模型的多语言预训练数据高效过滤方法	large language model
37	Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design	提出层级推测解码框架，解决量化模型中推测解码计算开销过大的问题	large language model	✅
38	DeepRTL2: A Versatile Model for RTL-Related Tasks	DeepRTL2：用于RTL相关任务的多功能大型语言模型	large language model
39	Curse of High Dimensionality Issue in Transformer for Long-context Modeling	提出动态分组注意力（DGA）以解决Transformer长文本建模中的高维诅咒问题	large language model	✅
40	Knowledge Base Construction for Knowledge-Augmented Text-to-SQL	构建知识库增强Text-to-SQL，提升LLM在领域数据库上的查询精度	large language model
41	OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature	OWL数据集揭示LLM在世界文学中跨语言记忆能力，即使对低资源语言也有效。	large language model
42	What Has Been Lost with Synthetic Evaluation?	评估LLM生成基准的有效性：揭示合成评估中信息损失	large language model
43	First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons & Dragons Gameplay	提出“窃听代理”范式，利用多模态LLM辅助人类对话，以龙与地下城游戏为例。	multimodal	✅
44	Automated Essay Scoring Incorporating Annotations from Automated Feedback Systems	通过整合自动反馈系统的标注，提升自动作文评分的准确性	large language model
45	ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM	ClaimPKG：利用轻量级专用LLM生成伪子图，增强声明验证能力	large language model
46	Co-Saving: Resource Aware Multi-Agent Collaboration for Software Development	提出Co-Saving，一种资源感知的多智能体协作软件开发框架，提升效率和代码质量。	large language model
47	GateNLP at SemEval-2025 Task 10: Hierarchical Three-Step Prompting for Multilingual Narrative Classification	提出分层三步提示（H3Prompt）方法，用于多语言叙事分类，并在SemEval-2025任务中取得领先。	large language model	✅
48	Self-Critique and Refinement for Faithful Natural Language Explanations	提出SR-NLE框架，通过自批判与改进提升LLM自然语言解释的忠实性	large language model
49	GuessArena: Guess Who I Am? A Self-Adaptive Framework for Evaluating LLMs in Domain-Specific Knowledge and Reasoning	GuessArena：提出自适应评估框架，用于评估LLM在特定领域知识和推理能力	large language model
50	Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs	LLM中的随机变色龙现象：无关上下文诱导的幻觉揭示了基于类别的(误)泛化	large language model
51	Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding	Fast-dLLM：通过KV缓存和并行解码加速Diffusion LLM的训练，无需额外训练。	large language model
52	Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts	提出LayerMoE：一种基于层级混合专家模型的LLM高效多语言扩展方法	large language model
53	Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation	提出一种灵活的多LLM集成框架，实现可扩展的知识聚合	large language model	✅
54	Fair Document Valuation in LLM Summaries via Shapley Values	提出基于Shapley值的Cluster Shapley算法，用于LLM摘要中文档贡献的公平评估。	large language model
55	SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context	提出SkewRoute，一种免训练的LLM路由方法，用于知识图谱RAG，通过检索上下文的分数偏度。	large language model	✅
56	Measuring Sycophancy of Language Models in Multi-turn Dialogues	提出SYCON Bench，用于评估多轮对话中语言模型的谄媚行为	large language model	✅
57	Advancing Expert Specialization for Better MoE	提出正交性和方差损失，提升MoE模型专家特化能力	large language model
58	InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing	InComeS：通过压缩与选择机制增强LLM，实现高效的模型编辑	large language model
59	ChatCFD: An LLM-Driven Agent for End-to-End CFD Automation with Domain-Specific Structured Reasoning	ChatCFD：一个基于LLM的端到端CFD自动化Agent，具备领域特定结构化推理能力	large language model	✅
60	Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home?	提出基于相似度的MIA检测框架，保护RAG系统中检索数据的隐私	large language model
61	Reviewing Scientific Papers for Critical Problems With Reasoning LLMs: Baseline Approaches and Automatic Evaluation	利用推理LLM评估科学论文质量：基线方法与自动评估框架	large language model
62	Found in Translation: Measuring Multilingual LLM Consistency as Simple as Translate then Evaluate	提出一种基于翻译和评估的框架，用于衡量多语言LLM在不同语言间的一致性。	large language model
63	Leveraging Interview-Informed LLMs to Model Survey Responses: Comparative Insights from AI-Generated and Human Data	利用访谈信息增强的大语言模型模拟调查问卷回复，对比分析AI生成数据与人类数据。	large language model
64	Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack	揭示视觉-语言模型对抗攻击脆弱性，提出双阶段评估框架与安全对齐规范。	multimodal
65	EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse	EFIM：通过改进KV缓存复用，高效服务于LLM的文本填充任务	large language model	✅
66	Principled Content Selection to Generate Diverse and Personalized Multi-Document Summaries	提出基于原则性内容选择的多文档摘要方法，提升多样性和个性化。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (24 篇)

#	题目	一句话要点	标签	🔗
67	Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start	提出基于冷启动的强化学习方法，提升多模态大语言模型的推理能力	reinforcement learning large language model multimodal	✅
68	AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models	提出AutoL2S框架，通过自适应长短推理提升大语言模型效率。	distillation large language model chain-of-thought
69	Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge	提出一种融合心理学知识的多模态抑郁症检测方法，利用大语言模型提升识别精度。	MAE large language model multimodal	✅
70	Reverse Preference Optimization for Complex Instruction Following	提出反向偏好优化（RPO）方法，提升LLM在复杂指令跟随任务中的性能	DPO large language model instruction following
71	Derailing Non-Answers via Logit Suppression at Output Subspace Boundaries in RLHF-Aligned Language Models	提出一种基于logit抑制的策略，无需训练即可降低RLHF对齐语言模型在敏感内容上的拒绝率。	RLHF distillation large language model
72	Multi-MLLM Knowledge Distillation for Out-of-Context News Detection	提出多模态大语言模型知识蒸馏以解决低资源环境下的新闻检测问题	DPO distillation large language model
73	Enhancing Paraphrase Type Generation: The Impact of DPO and RLHF Evaluated with Human-Ranked Data	利用DPO和RLHF提升释义类型生成，并用人工排序数据评估	RLHF DPO direct preference optimization
74	The Price of a Second Thought: On the Evaluation of Reasoning Efficiency in Large Language Models	提出COTHINK：一种提升大语言模型推理效率的框架，降低计算成本。	reinforcement learning large language model
75	Emotion-o1: Adaptive Long Reasoning for Emotion Understanding in LLMs	Emotion-o1：自适应长推理提升LLM的情感理解能力	reinforcement learning HuMoR large language model
76	Beyond path selection: Better LLMs for Scientific Information Extraction with MimicSFT and Relevance and Rule-induced(R$^2$)GRPO	提出MimicSFT和R²GRPO，提升LLM在科学信息抽取中的推理能力	reinforcement learning distillation large language model	✅
77	RAD: Redundancy-Aware Distillation for Hybrid Models via Self-Speculative Decoding	提出RAD：通过自推测解码实现混合模型冗余感知蒸馏	SSM state space model distillation
78	WorkForceAgent-R1: Incentivizing Reasoning Capability in LLM-based Web Agents via Reinforcement Learning	WorkForceAgent-R1：通过强化学习提升LLM网页Agent在企业环境中的推理能力	reinforcement learning large language model
79	LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents	LaMDAgent：利用LLM Agent自主优化后训练流程，提升模型性能。	preference learning large language model instruction following
80	Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection	提出ORION框架，通过误差感知自反思增强长链推理蒸馏效果	distillation large language model	✅
81	Learning to Route Queries Across Knowledge Bases for Step-wise Retrieval-Augmented Reasoning	提出R1-Router，通过学习跨知识库查询路由实现多模态检索增强推理。	reinforcement learning large language model multimodal
82	Jailbreak Distillation: Renewable Safety Benchmarking	提出Jailbreak Distillation框架，用于构建可更新的大语言模型安全基准	distillation large language model
83	Text2Grad: Reinforcement Learning from Natural Language Feedback	Text2Grad：利用自然语言反馈进行强化学习，实现细粒度梯度更新。	reinforcement learning RLHF	✅
84	Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition	盘古Embedded：一种具备元认知的高效双系统LLM推理器	reinforcement learning distillation large language model
85	First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training	提出MM-UPT框架，通过无监督后训练持续提升多模态LLM的推理能力。	reinforcement learning large language model	✅
86	VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning	提出VRAG-RL以解决视觉丰富信息理解中的推理挑战	reinforcement learning	✅
87	Training Language Models to Generate Quality Code with Program Analysis Feedback	REAL框架：利用程序分析反馈训练语言模型生成高质量代码	reinforcement learning large language model
88	Latent Reasoning via Sentence Embedding Prediction	提出基于句子嵌入预测的潜在推理框架，提升语言模型的抽象推理能力。	representation learning chain-of-thought
89	The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason	研究奖励噪声对LLM推理能力的影响，提出基于推理模式奖励的校准方法。	reinforcement learning large language model	✅
90	ValueSim: Generating Backstories to Model Individual Value Systems	ValueSim：通过生成背景故事来建模个体价值体系，提升LLM的价值观对齐。	reinforcement learning large language model

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
91	ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark	提出ASyMOB代数符号数学运算基准，评估LLM的符号运算能力及泛化性	manipulation large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
92	Conversational Alignment with Artificial Intelligence in Context	提出CONTEXT-ALIGN框架，评估AI对话智能体与人类沟通规范的对齐程度。	affordance large language model

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
93	CoMaPOI: A Collaborative Multi-Agent Framework for Next POI Prediction Bridging the Gap Between Trajectory and Language	CoMaPOI：协同多智能体框架弥合轨迹与语言，提升下一地点预测精度	spatiotemporal large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页

cs.CL（2025-05-28）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (66 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (24 篇)

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理