cs.CL(2025-05-21)

📊 共 104 篇论文 | 🔗 20 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (79 🔗14) 支柱二:RL算法与架构 (RL & Architecture) (21 🔗5) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (79 篇)

#题目一句话要点标签🔗
1 Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMs 提出KnowRecall和VisRecall基准,评估多模态LLM的跨语言一致性 large language model multimodal
2 Feature Extraction and Steering for Enhanced Chain-of-Thought Reasoning in Language Models 提出基于特征提取和引导的CoT推理增强方法,无需外部数据集。 large language model chain-of-thought
3 NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation 提出NeSyGeo神经符号框架,用于生成多样且泛化的多模态几何推理数据。 large language model multimodal
4 Forging Time Series with Language: A Large Language Model Approach to Synthetic Data Generation SDForger:利用大语言模型生成高质量时间序列合成数据 large language model multimodal
5 PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions 提出PhysicsArena:首个多模态物理推理基准,评估变量、过程和解题能力 large language model multimodal
6 RRTL: Red Teaming Reasoning Large Language Models in Tool Learning 提出RRTL,用于评估推理大语言模型在工具学习中的安全性 large language model chain-of-thought
7 TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration TACO:通过任务映射引导序列配置,增强多模态上下文学习 multimodal
8 HDLxGraph: Bridging Large Language Models and HDL Repositories via HDL Graph Databases HDLxGraph:通过HDL图数据库桥接大语言模型与HDL代码仓库,提升硬件设计任务性能。 large language model
9 Listen to the Context: Towards Faithful Large Language Models for Retrieval Augmented Generation on Climate Questions 针对气候问题,提出ClimateGPT Faithful+,提升检索增强生成中大语言模型的忠实度。 large language model
10 SLMEval: Entropy-Based Calibration for Human-Aligned Evaluation of Large Language Models 提出SLMEval,基于熵最大化校准LLM评估器,提升与人类判断的一致性 large language model
11 Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization 利用大型语言模型提取概率知识,用于贝叶斯网络参数化 large language model
12 FedSEA-LLaMA: A Secure, Efficient and Adaptive Federated Splitting Framework for Large Language Models FedSEA-LLaMA:面向LLaMA2的安全、高效、自适应联邦切分框架 large language model
13 LFTF: Locating First and Then Fine-Tuning for Mitigating Gender Bias in Large Language Models 提出LFTF算法,通过定位并微调LLM特定模块以缓解性别偏见。 large language model
14 Towards Explainable Temporal Reasoning in Large Language Models: A Structure-Aware Generative Framework 提出GETER框架,增强大语言模型在时序推理中的可解释性,并构建了相应的评测基准。 large language model
15 Prolonged Reasoning Is Not All You Need: Certainty-Based Adaptive Routing for Efficient LLM/MLLM Reasoning 提出基于置信度的自适应推理框架CAR,提升LLM/MLLM推理效率与准确性。 large language model multimodal chain-of-thought
16 OpenEthics: A Comprehensive Ethical Evaluation of Open-Source Generative Large Language Models 提出OpenEthics以全面评估开源生成大语言模型的伦理问题 large language model
17 Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering CoPriva:针对大语言模型在问答中安全策略保持能力的大规模评测基准 large language model
18 Evolutionary Computation and Large Language Models: A Survey of Methods, Synergies, and Applications 探索进化计算与大语言模型的协同:方法、协同与应用综述 large language model
19 After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in Retrieval-Augmented Generation BRIDGE框架提升RAG中大语言模型在知识冲突场景下的可信度 large language model
20 Comparative Evaluation of Prompting and Fine-Tuning for Applying Large Language Models to Grid-Structured Geospatial Data 对比Prompting与微调,利用大语言模型处理网格化地理空间数据 large language model
21 Beyond Empathy: Integrating Diagnostic and Therapeutic Reasoning with Large Language Models for Mental Health Counseling PsyLLM:首个集成诊断与治疗推理的大语言模型,用于心理健康咨询。 large language model
22 LyapLock: Bounded Knowledge Preservation in Sequential Large Language Model Editing LyapLock:序列大语言模型编辑中保证有界知识保留的框架 large language model
23 Can Large Language Models be Effective Online Opinion Miners? 提出OOMB基准数据集,评估大语言模型在在线意见挖掘中的有效性 large language model
24 ThinkLess: A Training-Free Inference-Efficient Method for Reducing Reasoning Redundancy ThinkLess:一种免训练的推理加速方法,减少LLM推理冗余 large language model instruction following chain-of-thought
25 Cultural Value Alignment in Large Language Models: A Prompt-based Analysis of Schwartz Values in Gemini, ChatGPT, and DeepSeek 通过提示分析,揭示大型语言模型在施瓦茨价值观上的文化价值对齐差异 large language model
26 Gated Integration of Low-Rank Adaptation for Continual Learning of Large Language Models 提出GainLoRA,通过门控集成LoRA解决LLM持续学习中的灾难性遗忘问题 large language model
27 Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector 提出基于推理的偏差检测器RBD,提升大语言模型作为评判者的可靠性。 large language model
28 SciCUEval: A Comprehensive Dataset for Evaluating Scientific Context Understanding in Large Language Models SciCUEval:构建综合数据集,评估大语言模型在科学领域的上下文理解能力 large language model
29 Are LLMs reliable? An exploration of the reliability of large language models in clinical note generation 评估大型语言模型在临床笔记生成中的可靠性,推荐本地部署小型开源模型。 large language model
30 Can Large Language Models Understand Internet Buzzwords Through User-Generated Content 提出RESS方法并构建CHEER数据集,提升大语言模型对互联网流行语的理解能力 large language model
31 Lost in Benchmarks? Rethinking Large Language Model Benchmarking with Item Response Theory 利用项目反应理论重新评估大语言模型评测基准的有效性 large language model
32 Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models 提出DLISC:一种基于双LoRA与增量Schema缓存的设备端高效信息抽取方法 large language model
33 Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning 提出Joint Flashback Adaptation方法,解决指令调优中大模型灾难性遗忘问题 large language model instruction following
34 Aug2Search: Enhancing Facebook Marketplace Search with LLM-Generated Synthetic Data Augmentation Aug2Search:利用LLM生成合成数据增强Facebook Marketplace搜索效果 large language model multimodal
35 MIKU-PAL: An Automated and Standardized Multi-Modal Method for Speech Paralinguistic and Affect Labeling MIKU-PAL:一种自动、标准化的多模态语音副语言和情感标注方法 large language model multimodal
36 TurnaboutLLM: A Deductive Reasoning Benchmark from Detective Games 提出TurnaboutLLM,一个基于侦探游戏的LLM演绎推理能力评测基准 large language model chain-of-thought
37 Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling 提出MMER以解决多模态大语言模型的扩展与保留问题 large language model multimodal
38 FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management FlowKV:通过隔离的键值缓存管理增强LLM中的多轮对话连贯性 large language model instruction following
39 Web-Shepherd: Advancing PRMs for Reinforcing Web Agents Web-Shepherd:提出用于增强Web代理的流程奖励模型,解决Web导航任务缺乏专用奖励模型的问题。 large language model multimodal
40 RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals 提出RoT:通过迭代行遍历增强表格推理能力,无需训练。 large language model chain-of-thought
41 Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective 提出扩散语言模型用于文本嵌入,显著提升长文档和推理检索性能 large language model instruction following
42 Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems 提出Spoken-MQA基准,评估语音模型在多方面数学问题上的推理能力 large language model multimodal
43 Beyond Hard and Soft: Hybrid Context Compression for Balancing Local and Global Information Retention 提出HyCo₂混合上下文压缩方法,平衡局部和全局信息保留,提升长文本推理性能。 large language model
44 Scaling Physical Reasoning with the PHYSICS Dataset 提出PHYSICS数据集,用于提升和评估LLM在物理推理任务上的能力。 large language model
45 DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning 提出DTE框架,通过多智能体辩论和自进化训练提升语言模型推理能力。 large language model
46 Advancing LLM Safe Alignment with Safety Representation Ranking 提出安全表征排序(SRR),利用LLM内部状态提升对抗性prompt下的安全对齐。 large language model
47 Can LLMs $\textit{understand}$ Math? -- Exploring the Pitfalls in Mathematical Reasoning 提出MAPLE指标,用于全面评估LLM在数学推理中的逻辑对齐程度 large language model
48 A quantitative analysis of semantic information in deep representations of text and images 提出一种量化方法,分析文本和图像深度表征中的语义信息。 large language model
49 DeFTX: Denoised Sparse Fine-Tuning for Zero-Shot Cross-Lingual Transfer DeFT-X:通过去噪稀疏微调实现零样本跨语言迁移 large language model
50 MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision MAS-ZERO:无需监督的自进化多智能体系统设计框架 large language model
51 Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space 提出Soft Thinking,在连续概念空间中提升LLM的推理能力。 chain-of-thought
52 Shared Path: Unraveling Memorization in Multilingual LLMs through Language Similarities 通过语言相似性揭示多语言LLM中的记忆现象 large language model
53 From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning 对比人类与LLM概念结构,揭示LLM压缩语义的特性及局限性 large language model
54 UniErase: Towards Balanced and Precise Unlearning in Language Models UniErase:提出一种平衡且精确的语言模型卸载框架,提升卸载效果和能力保持。 large language model
55 Protoknowledge Shapes Behaviour of LLMs in Downstream Tasks: Memorization and Generalization with Knowledge Graphs 提出protoknowledge概念,分析LLM在下游任务中知识图谱的记忆与泛化行为 large language model
56 RePPL: Recalibrating Perplexity by Uncertainty in Semantic Propagation and Language Generation for Explainable QA Hallucination Detection 提出RePPL以解决语言模型幻觉检测的可解释性问题 large language model
57 Your Language Model Can Secretly Write Like Humans: Contrastive Paraphrase Attacks on LLM-Generated Text Detectors 提出对比释义攻击CoPA,无需训练即可有效欺骗LLM文本检测器 large language model
58 Prototypical Human-AI Collaboration Behaviors from LLM-Assisted Writing in the Wild 提出原型人机协作行为(PATHs),分析LLM辅助写作中用户与AI的交互模式。 large language model
59 Explaining Puzzle Solutions in Natural Language: An Exploratory Study on 6x6 Sudoku 评估LLM在6x6数独解题与自然语言解释中的能力,揭示其在策略推理上的不足。 large language model
60 Generalizable Process Reward Models via Formally Verified Training Data 提出FoVer,通过形式化验证自动生成训练数据,提升通用过程奖励模型性能。 large language model
61 MAPS: A Multilingual Benchmark for Global Agent Performance and Security MAPS:一个用于评估多语言环境下Agent性能与安全性的基准测试套件 large language model
62 Learning to Reason via Mixture-of-Thought for Logical Reasoning 提出混合思维(MoT)框架,用于提升LLM在逻辑推理中的性能。 chain-of-thought
63 MTR-Bench: A Comprehensive Benchmark for Multi-Turn Reasoning Evaluation MTR-Bench:构建多轮推理综合评测基准,揭示LLM交互推理能力不足 large language model
64 ToxicTone: A Mandarin Audio Dataset Annotated for Toxicity and Toxic Utterance Tonality ToxicTone:构建大规模中文语音毒性数据集,并提出多模态检测框架。 multimodal
65 Self-Interpretability: LLMs Can Describe Complex Internal Processes that Drive Their Decisions 自解释性:大语言模型能描述驱动决策的复杂内部过程 large language model
66 VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models VocalBench:用于评估语音交互模型会话能力的综合基准 large language model
67 Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen! 揭示微调开源LLM的数据泄露风险:攻击者可通过后门提取微调数据 large language model
68 InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation InfoDeepSeek:提出Agentic RAG评测基准,评估真实动态网络环境下的智能信息检索能力 large language model
69 CoLA: Collaborative Low-Rank Adaptation CoLA:一种协同低秩适应方法,提升低样本场景下多任务微调性能。 large language model
70 An Empirical Study of the Anchoring Effect in LLMs: Existence, Mechanism, and Potential Mitigations 研究表明大型语言模型存在锚定效应,并提出潜在缓解策略 large language model
71 X-WebAgentBench: A Multilingual Interactive Web Benchmark for Evaluating Global Agentic System X-WebAgentBench:多语言交互式Web基准,评估全局Agent系统 large language model
72 NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging 提出NL-DEBUGGING框架,利用自然语言中间表示提升LLM代码调试能力 large language model
73 Emotional Supporters often Use Multiple Strategies in a Single Turn 重新定义情感支持对话任务,关注单轮多策略现象,并验证大型语言模型在该任务上的优越性。 large language model
74 Chinese Toxic Language Mitigation via Sentiment Polarity Consistent Rewrites ToxiRewriteCN:首个中文情感极性一致的有害言论改写数据集,提升LLM在微妙语境下的解毒能力。 large language model
75 Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization 揭示长文本生成中幻觉的位置偏见:集中于末尾,并探索缓解方法 large language model
76 Multilingual Prompting for Improving LLM Generation Diversity 提出多语言提示方法,提升大型语言模型生成内容的多样性 large language model
77 R-TOFU: Unlearning in Large Reasoning Models 提出R-TOFU基准,用于评估大型推理模型中知识遗忘的有效性 chain-of-thought
78 BanglaByT5: Byte-Level Modelling for Bangla 提出BanglaByT5,一种面向孟加拉语的字节级编码器-解码器模型,提升资源受限场景下的NLP性能。 large language model
79 DUSK: Do Not Unlearn Shared Knowledge DUSK基准测试:评估LLM在数据重叠场景下的选择性遗忘能力 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (21 篇)

#题目一句话要点标签🔗
80 Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Hunyuan-TurboS:通过Mamba-Transformer协同和自适应CoT提升大语言模型性能 reinforcement learning Mamba large language model
81 Systematic Evaluation of Machine-Generated Reasoning and PHQ-9 Labeling for Depression Detection Using Large Language Models 系统性评估LLM推理能力与PHQ-9标注在抑郁症检测中的应用,并提出优化策略。 DPO direct preference optimization large language model
82 Shallow Preference Signals: Large Language Model Aligns Even Better with Truncated Data? 发现大语言模型偏好信号浅层性:截断数据对齐效果更佳 reinforcement learning RLHF DPO
83 GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents 提出GUI-G1以解决GUI代理视觉定位中的训练挑战 reinforcement learning visual grounding chain-of-thought
84 Aligning Dialogue Agents with Global Feedback via Large Language Model Reward Decomposition 提出基于大语言模型奖励分解的对话Agent对齐框架,仅用会话级反馈即可优化对话质量。 reward shaping large language model multimodal
85 DISCO Balances the Scales: Adaptive Domain- and Difficulty-Aware Reinforcement Learning on Imbalanced Data DISCO:自适应领域与难度感知的强化学习,解决不平衡数据下的LLM对齐问题 reinforcement learning policy learning consistency policy
86 VerifyBench: Benchmarking Reference-based Reward Systems for Large Language Models 提出VerifyBench和VerifyBench-Hard,用于评估基于参考答案的大语言模型奖励系统。 reinforcement learning large language model
87 DayDreamer at CQs-Gen 2025: Generating Critical Questions through Argument Scheme Completion 提出基于论证模式补全的DayDreamer系统,用于生成批判性问题 dreamer large language model chain-of-thought
88 Self-GIVE: Associative Thinking from Limited Structured Knowledge for Enhanced Large Language Model Reasoning Self-GIVE:利用有限结构化知识增强大语言模型推理的联想思维 reinforcement learning large language model
89 ConvSearch-R1: Enhancing Query Reformulation for Conversational Search with Reasoning via Reinforcement Learning ConvSearch-R1:利用强化学习和检索信号,提升会话搜索中的查询改写效果。 reinforcement learning distillation reward shaping
90 Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs 提出Deliberation over Priors (DP)框架,提升大语言模型在知识图谱上的推理可信度。 distillation large language model
91 NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning 提出NOVER:一种无验证器的强化学习框架,用于语言模型的激励训练 reinforcement learning large language model
92 From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning 提出基于强化学习的LLM教学对齐框架,提升问题解决教学效果 reinforcement learning large language model
93 An Empirical Study on Reinforcement Learning for Reasoning-Search Interleaved LLM Agents 针对推理-搜索交错LLM智能体,研究强化学习训练的关键因素与有效策略 reinforcement learning large language model
94 Mechanistic evaluation of Transformers and state space models 通过因果干预,揭示Transformer和状态空间模型在关联回忆任务中的机制差异 Mamba SSM state space model
95 Reverse Engineering Human Preferences with Reinforcement Learning 利用强化学习逆向工程人类偏好,提升LLM评估得分且难以检测。 reinforcement learning large language model
96 On the Generalization vs Fidelity Paradox in Knowledge Distillation 大规模分析揭示知识蒸馏在小模型上的有效性及泛化-保真度悖论 distillation large language model
97 Learn to Reason Efficiently with Adaptive Length-based Reward Shaping 提出基于自适应长度奖励塑造的LASER-D方法,提升大型推理模型的效率。 reinforcement learning reward shaping
98 TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning TemplateRL:结构化模板引导的强化学习,提升LLM推理能力 reinforcement learning
99 Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment 提出RLPA框架,通过动态用户画像建模实现个性化对齐,提升LLM对话效果。 reinforcement learning large language model
100 When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners 提出语言-推理解耦方法,提升大语言模型的多语言推理能力 reinforcement learning large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
101 Enhancing Large Language Models for Detecting Mental Manipulation via Annotation-Free Data Augmentation and Anti-Curriculum Distillation 提出MentalMAC框架,通过无标注数据增强和反课程蒸馏提升大语言模型对精神操控的检测能力 manipulation distillation large language model
102 Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation 揭示LVLM评估的视觉偏见:对抗性图像操纵可欺骗LVLM评判 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
103 RAVEN: Query-Guided Representation Alignment for Question Answering over Audio, Video, Embedded Sensors, and Natural Language RAVEN:提出查询引导的表征对齐方法,用于多模态问答任务。 egocentric large language model multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
104 The Representational Alignment between Humans and Language Models is implicitly driven by a Concreteness Effect 探讨人类与语言模型之间的隐性语义一致性 implicit representation

⬅️ 返回 cs.CL 首页 · 🏠 返回主页