cs.CL(2025-05-27)

📊 共 78 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (65 🔗11) 支柱二:RL算法与架构 (RL & Architecture) (9 🔗2) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (65 篇)

#题目一句话要点标签🔗
1 Leveraging Large Language Models for Bengali Math Word Problem Solving with Chain of Thought Reasoning 提出SOMADHAN数据集以解决孟加拉数学文字问题 large language model chain-of-thought
2 Evaluating and Steering Modality Preferences in Multimodal Large Language Model 提出MC²基准评估多模态大语言模型中的模态偏好,并通过表征工程实现偏好操控。 large language model multimodal
3 Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities 揭示指令特定神经元与专家:用于分析LLM指令遵循能力的框架 large language model instruction following
4 Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective 提出MAMMQA多智能体框架,提升多模态问答的准确性和可解释性 large language model multimodal
5 Explaining Large Language Models with gSMILE 提出gSMILE框架,用于提升大型语言模型token级别可解释性 large language model
6 LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions LayerIF:利用影响函数估计大语言模型各层训练质量 large language model
7 Test-Time Learning for Large Language Models 提出TLM:一种面向大语言模型的测试时学习方法,提升领域知识适应能力。 large language model
8 Rethinking the Outlier Distribution in Large Language Models: An In-depth Study 深入研究大语言模型中的异常值分布以提升量化性能 large language model
9 How does Misinformation Affect Large Language Model Behaviors and Preferences? 提出MisBench基准,分析并提升大语言模型对虚假信息的辨别能力 large language model
10 RelationalFactQA: A Benchmark for Evaluating Tabular Fact Retrieval from Large Language Models 提出RelationalFactQA基准,评估LLM从参数知识中检索表格事实的能力 large language model
11 Who Reasons in the Large Language Models? 研究表明LLM的推理能力主要源于Transformer中的输出投影模块 large language model
12 Multi-objective Large Language Model Alignment with Hierarchical Experts 提出HoE:一种轻量级、参数高效的即插即用方法,用于多目标大语言模型对齐。 large language model
13 Automatic Transmission for LLM Tiers: Optimizing Cost and Accuracy in Large Language Models 提出LLM-AT框架,无需训练自动选择LLM层级,优化成本与准确率。 large language model
14 DenseLoRA: Dense Low-Rank Adaptation of Large Language Models DenseLoRA:通过密集低秩矩阵提升大语言模型参数效率与性能 large language model
15 DLP: Dynamic Layerwise Pruning in Large Language Models DLP:一种用于大语言模型的动态层级剪枝方法,提升高稀疏度下的性能。 large language model
16 CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models 提出CogniBench,用于评估大型语言模型在认知层面上的忠实性 large language model
17 From prosthetic memory to prosthetic denial: Auditing whether large language models are prone to mass atrocity denialism 审核大型语言模型对大规模暴行否认的倾向,揭示其潜在的“人造否认”风险。 large language model
18 Rethinking Data Mixture for Large Language Models: A Comprehensive Survey and New Perspectives 针对大语言模型,提出数据混合方法的全面综述与新视角,优化训练数据配比。 large language model
19 DecisionFlow: Advancing Large Language Model as Principled Decision Maker DecisionFlow:提升大语言模型在决策场景中的理性决策能力 large language model
20 Leveraging large language models and traditional machine learning ensembles for ADHD detection from narrative transcripts 提出融合LLM与传统ML集成的框架,用于从叙事文本中检测ADHD。 large language model
21 Assessment of L2 Oral Proficiency using Speech Large Language Models 利用语音大语言模型评估二语口语能力,显著提升评估性能与泛化性。 large language model
22 RPM: Reasoning-Level Personalization for Black-Box Large Language Models RPM:面向黑盒大语言模型的推理级个性化框架 large language model
23 Uncertainty Unveiled: Can Exposure to More In-context Examples Mitigate Uncertainty for Large Language Models? 通过增加上下文示例缓解大语言模型的不确定性 large language model
24 Research Community Perspectives on "Intelligence" and Large Language Models 调查研究人员对“智能”和大型语言模型的认知与期望 large language model
25 On VLMs for Diverse Tasks in Multimodal Meme Classification 提出结合视觉语言模型与语言模型的新方法,提升多模态Meme分类任务性能。 multimodal
26 Automated Privacy Information Annotation in Large Language Model Interactions 构建大规模隐私信息标注数据集,用于评估LLM交互中的隐私泄露风险。 large language model
27 STEER-BENCH: A Benchmark for Evaluating the Steerability of Large Language Models 提出Steer-Bench基准,用于评估大型语言模型在群体特定规范下的可控性。 large language model
28 SV-TrustEval-C: Evaluating Structure and Semantic Reasoning in Large Language Models for Source Code Vulnerability Analysis SV-TrustEval-C:评估大语言模型在C代码漏洞分析中的结构和语义推理能力 large language model
29 Towards Pretraining Robust ASR Foundation Model with Acoustic-Aware Data Augmentation 针对语音识别,提出声学感知数据增强方法,提升模型泛化能力 foundation model
30 Self-Route: Automatic Mode Switching via Capability Estimation for Efficient Reasoning 提出Self-Route,通过能力评估自动切换推理模式,提升大语言模型推理效率。 large language model chain-of-thought
31 AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs 提出AutoJudger,通过智能Agent驱动高效评估多模态大语言模型 large language model multimodal
32 Visual Cues Enhance Predictive Turn-Taking for Two-Party Human Interaction 提出MM-VAP以增强人机交互中的预测轮流发言能力 multimodal
33 Predicting Implicit Arguments in Procedural Video Instructions 提出Implicit-VidSRL数据集,并用iSRL-Qwen2-VL模型提升视频指令中隐式语义角色预测。 multimodal
34 LLMPR: A Novel LLM-Driven Transfer Learning based Petition Ranking Model LLMPR:一种基于LLM驱动的迁移学习请愿排序模型,用于优化司法流程。 large language model
35 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing R2R:通过小-大模型Token路由高效导航发散推理路径 large language model
36 Factual Self-Awareness in Language Models: Representation, Robustness, and Scaling 揭示大语言模型事实性自我感知能力:内部表征、鲁棒性与规模效应 large language model
37 Exploring the Hidden Capacity of LLMs for One-Step Text Generation 揭示LLM单步文本生成潜力:仅用两个嵌入即可生成数百token large language model
38 A Lightweight Multi-Expert Generative Language Model System for Engineering Information and Knowledge Extraction 提出轻量级多专家生成语言模型系统SLG,用于工程信息与知识抽取。 large language model
39 SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences SpecExtend:一种用于长序列推测解码的即插即用增强方法 large language model
40 CodeMirage: A Multi-Lingual Benchmark for Detecting AI-Generated and Paraphrased Source Code from Production-Level LLMs CodeMirage:一个用于检测生产级LLM生成的和释义源代码的多语言基准 large language model
41 REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning REAL-Prover:一种检索增强的Lean定理证明器,用于数学推理 large language model
42 Calibrating LLMs for Text-to-SQL Parsing by Leveraging Sub-clause Frequencies 提出基于子句频率的LLM校准方法,提升Text-to-SQL解析的置信度评估。 large language model
43 RefTool: Enhancing Model Reasoning with Reference-Guided Tool Creation RefTool:利用参考资料引导工具创建,增强模型推理能力 large language model
44 Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science 提出一种数据增强的LLM研究思路生成方法,提升社会科学研究可行性与质量。 large language model
45 Evaluating LLM Adaptation to Sociodemographic Factors: User Profile vs. Dialogue History 提出评估框架,研究LLM在用户画像和对话历史中对社会人口因素的适应性 large language model
46 Pretrained LLMs Learn Multiple Types of Uncertainty 研究表明预训练LLM在未明确训练下已能捕捉多种不确定性 large language model
47 BLUCK: A Benchmark Dataset for Bengali Linguistic Understanding and Cultural Knowledge 提出BLUCK:孟加拉语理解与文化知识的基准数据集 large language model
48 MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation 提出MARS-Bench,用于评估LLM在体育赛事多轮对话场景下的鲁棒性 large language model
49 LLM-Driven E-Commerce Marketing Content Optimization: Balancing Creativity and Conversion 提出基于LLM的电商营销内容优化框架,平衡创意与转化率。 multimodal
50 Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties Trans-EnV框架评估LLM在不同英语变体下的语言鲁棒性 large language model
51 Calibrating LLM Confidence by Probing Perturbed Representation Stability CCPS:通过探测扰动表征稳定性校准大语言模型置信度 large language model
52 Do We Know What LLMs Don't Know? A Study of Consistency in Knowledge Probing 揭示大语言模型知识探测的不一致性,强调鲁棒性探测框架的重要性 large language model
53 MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs MAKIEval:一个基于维基数据的多语言框架,用于评估LLM的文化意识。 large language model
54 Are Language Models Consequentialist or Deontological Moral Reasoners? 提出道德推理分类框架以分析语言模型的伦理判断 large language model
55 Do LLMs Need to Think in One Language? Correlation between Latent Language and Task Performance 研究LLM潜在语言一致性对下游任务性能的影响,发现并非始终必要。 large language model
56 PEDANTIC: A Dataset for the Automatic Examination of Definiteness in Patent Claims 提出PEDANTIC数据集,用于自动审查专利权利要求中的不确定性问题。 large language model
57 Charting the Landscape of African NLP: Mapping Progress and Shaping the Road Ahead 综述非洲自然语言处理研究进展,分析现状并展望未来发展方向。 large language model
58 rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset rStar-Coder:构建大规模验证数据集,提升LLM在代码推理方面的能力 large language model
59 Towards Objective Fine-tuning: How LLMs' Prior Knowledge Causes Potential Poor Calibration? 提出CogCalib以解决LLMs微调中的校准问题 large language model
60 MSA at SemEval-2025 Task 3: High Quality Weak Labeling and LLM Ensemble Verification for Multilingual Hallucination Detection MSA提出一种高质量弱标签与LLM集成验证方法,用于多语言幻觉检测。 large language model
61 Concealment of Intent: A Game-Theoretic Analysis 提出意图隐藏对抗提示攻击,并用博弈论分析LLM攻防策略 large language model
62 CHIMERA: A Knowledge Base of Scientific Idea Recombinations for Research Analysis and Ideation 提出CHIMERA知识库,用于科学思想重组分析与科研灵感激发 large language model
63 Beyond Templates: Dynamic Adaptation of Reasoning Demonstrations via Feasibility-Aware Exploration 提出DART框架,通过可行性探索动态调整推理演示,提升小模型推理能力。 large language model
64 Long Context Scaling: Divide and Conquer via Multi-Agent Question-driven Collaboration 提出XpandA框架,通过多Agent协作和问题驱动,提升LLM长文本处理能力。 large language model
65 POLAR: A Benchmark for Multilingual, Multicultural, and Multi-Event Online Polarization POLAR:一个用于多语言、多元文化和多事件在线极化现象的基准数据集。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
66 EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models EasyDistill:用于大语言模型知识蒸馏的综合工具包 reinforcement learning distillation large language model
67 Towards Better Instruction Following Retrieval Models 提出InF-IR数据集和InF-Embed模型,提升指令跟随信息检索能力 representation learning contrastive learning instruction following
68 Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning 提出ConciseR以解决LLM推理冗余问题 reinforcement learning large language model chain-of-thought
69 A Course Correction in Steerability Evaluation: Revealing Miscalibration and Side Effects in LLMs 提出多维目标空间框架,揭示LLM可控性评估中的校准误差和副作用 reinforcement learning large language model instruction following
70 FCKT: Fine-Grained Cross-Task Knowledge Transfer with Semantic Contrastive Learning for Targeted Sentiment Analysis 提出FCKT框架,通过细粒度跨任务知识迁移和语义对比学习提升目标情感分析性能。 contrastive learning large language model
71 TAT-R1: Terminology-Aware Translation with Reinforcement Learning and Word Alignment 提出TAT-R1,利用强化学习和词对齐提升术语翻译质量 reinforcement learning large language model
72 Contrastive Learning on LLM Back Generation Treebank for Cross-domain Constituency Parsing 提出基于LLM反向生成的对比学习方法,提升跨领域成分句法分析性能 contrastive learning large language model
73 SeqPO-SiMT: Sequential Policy Optimization for Simultaneous Machine Translation 提出SeqPO-SiMT框架,通过序列策略优化提升同步机器翻译质量并降低延迟。 reinforcement learning PPO RLHF
74 Divide-Then-Align: Honest Alignment based on the Knowledge Boundary of RAG 提出Divide-Then-Align以解决RAG系统的知识边界问题 DPO direct preference optimization large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
75 SELF-PERCEPT: Introspection Improves Large Language Models' Detection of Multi-Person Mental Manipulation in Conversations SELF-PERCEPT:利用内省提升大语言模型在对话中对多人心理操纵的检测能力 manipulation large language model
76 Tracing and Reversing Rank-One Model Edits 提出针对Rank-One模型编辑的可溯源与可逆方法,保障LLM免受恶意篡改。 manipulation large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
77 Can LLMs Learn to Map the World from Local Descriptions? 探索LLM从局部描述构建全局空间认知的能力,应用于空间感知与导航。 spatial relationship large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
78 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity 提出分组专家混合(MoGE)架构,提升稀疏模型在昇腾NPU上的训练和推理效率。 MoGe large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页