cs.CL(2025-05-20)

📊 共 109 篇论文 | 🔗 15 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (83 🔗12) 支柱二:RL算法与架构 (RL & Architecture) (24 🔗3) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (83 篇)

#题目一句话要点标签🔗
1 Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs 揭示LLM内部思维链:层级化子任务调度机制的实证研究 large language model chain-of-thought
2 ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models 提出ABBA-Adapters,通过高效且富有表现力的微调方法提升基础模型性能。 large language model foundation model
3 EfficientLLM: Efficiency in Large Language Models EfficientLLM:大规模语言模型效率评估基准与优化技术综合研究 large language model foundation model
4 ModRWKV: Transformer Multimodality in Linear Time 提出ModRWKV,一种基于RWKV7的线性时间复杂度多模态Transformer框架。 large language model multimodal
5 Enhanced Multimodal Aspect-Based Sentiment Analysis by LLM-Generated Rationales 提出LRSA框架,利用LLM生成的原因解释增强SLM在多模态情感分析中的性能。 large language model multimodal
6 CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring 提出CAFES框架以解决多模态自动评分的局限性 large language model multimodal
7 DecIF: Improving Instruction-Following through Meta-Decomposition DecIF:通过元分解提升大型语言模型的指令跟随能力 large language model instruction following
8 Large Language Models Implicitly Learn to See and Hear Just By Reading 仅通过阅读文本,大语言模型隐式学习视觉和听觉能力 large language model
9 Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models 提出稀疏增强张量网络Saten,用于大语言模型后训练压缩。 large language model
10 Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning 提出N-rep一致性方法,无需CoT或微调,实现低成本高鲁棒性的Text-to-SQL chain-of-thought
11 Scaling Laws for State Dynamics in Large Language Models 研究揭示大语言模型在状态动态建模中面临的挑战,并探究其内部状态追踪机制。 large language model
12 Toward Reliable Scientific Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models 提出TruthHypo基准和KnowHD检测器,评估LLM生成科学假设的真实性和幻觉问题。 large language model
13 Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations 揭示代码混合扰动下大语言模型归因安全性失效问题,并提出修复策略。 large language model
14 Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models 揭示大语言模型跨尺度参数知识迁移的神经不兼容性难题 large language model
15 DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models DiagnosisArena:构建诊断推理基准,评估大型语言模型在医疗诊断中的能力。 large language model
16 Development and Validation of Engagement and Rapport Scales for Evaluating User Experience in Multimodal Dialogue Systems 为多模态对话系统用户体验评估,开发并验证了交互投入度和亲和度量表 multimodal
17 Multimodal Cultural Safety: Evaluation Framework and Alignment Strategies 提出CROSS基准与CROSS-Eval框架,提升LVLM文化安全意识与合规性 multimodal
18 DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis 提出DECASTE框架以揭示大型语言模型中的种姓偏见 large language model
19 Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples 提出LISTEN方法,通过合成负样本缓解音频大语言模型中的幻觉问题 large language model
20 S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models S2SBench:用于量化语音到语音大语言模型智能退化的基准测试 large language model
21 OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking OmniGenBench:用于基因组基础模型可复现基准测试的模块化平台 foundation model
22 QA-prompting: Improving Summarization with Large Language Models using Question-Answering 提出QA-prompting方法,利用问答提升大语言模型长文本摘要能力 large language model
23 Cross-Lingual Optimization for Language Transfer in Large Language Models 提出跨语言优化(CLO)方法,提升大语言模型跨语言迁移能力并保持英语性能 large language model
24 Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification 构建统一框架,探索大语言模型在作者身份隐私保护中的混淆、模仿与验证作用 large language model
25 Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering 提出PDRR框架,弥合大语言模型与知识库在复杂问答中的鸿沟 large language model
26 ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs 提出ShieldVLM,通过审议推理增强LVLM在多模态隐式毒性检测中的安全性。 multimodal
27 AUTOLAW: Enhancing Legal Compliance in Large Language Models via Case Law Generation and Jury-Inspired Deliberation AutoLaw:通过案例生成与陪审团审议增强大语言模型法律合规性 large language model
28 Activation-Guided Consensus Merging for Large Language Models 提出激活引导的共识合并方法ACM,提升大语言模型合并效果。 large language model
29 Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection 研究多模态情感识别中模型预测分歧,揭示模态冲突下的潜在歧义。 multimodal
30 Informatics for Food Processing 提出FoodProX和多模态AI模型,提升食品加工评估的客观性和可扩展性 large language model multimodal
31 Amadeus-Verbo Technical Report: The powerful Qwen2.5 family models trained in Portuguese Amadeus-Verbo:针对巴西葡萄牙语的Qwen2.5系列大语言模型微调与开源 large language model foundation model
32 PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs PersonaTAB:利用全双工语音对话中的文本、声学和行为线索预测人格特质 large language model TAMP
33 Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst 提出自推理语言模型(SRLM),通过少量推理催化剂迭代提升复杂推理能力。 large language model chain-of-thought
34 Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM 提出图基分析框架以提升推理大型语言模型的理解 large language model chain-of-thought
35 Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels 提出TLDM基准,揭示LLM在长文本小说理解中超过64k tokens后性能显著下降 large language model
36 EasyMath: A 0-shot Math Benchmark for SLMs EasyMath:面向小型语言模型的零样本数学推理评测基准 chain-of-thought
37 Automated Journalistic Questions: A New Method for Extracting 5W1H in French 提出法语新闻5W1H自动抽取流程,性能媲美GPT-4o。 large language model
38 UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models UltraEdit:一种免训练、免主题、免记忆的语言模型终身编辑方法 large language model
39 WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications WirelessMathBench:无线通信领域大语言模型数学建模能力评测基准 large language model
40 Temporal Alignment of Time Sensitive Facts with Activation Engineering 利用激活工程实现LLM的时间敏感事实对齐,无需训练即可提升时间感知能力。 large language model
41 Through a Compressed Lens: Investigating The Impact of Quantization on Factual Knowledge Recall 研究量化对大语言模型事实知识回忆的影响,揭示量化引入的信息损失。 large language model
42 Mechanistic Interpretability of GPT-like Models on Summarization Tasks 提出一种针对GPT类模型在摘要任务上的可解释性分析框架,并实现性能提升。 large language model
43 WebNovelBench: Placing LLM Novelists on the Web Novel Distribution 提出WebNovelBench,用于评估LLM在长文本小说生成中的能力,并将其置于真实网络小说分布中进行对比。 large language model
44 Creative Preference Optimization 提出创造性偏好优化(CrPO),提升大语言模型生成内容的新颖性、多样性和质量。 large language model
45 MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language MUG-Eval:提出一种与语言无关的代理评估框架,用于评估任意语言的大语言模型生成能力 large language model
46 GemMaroc: Unlocking Darija Proficiency in LLMs with Minimal Data GemMaroc:利用少量数据提升LLM在摩洛哥阿拉伯语(Darija)上的能力 large language model
47 Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits 揭示LLM中Token化约束对符号和算术推理的限制,提出Token Awareness概念。 chain-of-thought
48 A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations PersonaConvBench:提出一个大规模个性化对话基准,用于评估LLM在多轮对话中的推理和生成能力。 large language model
49 GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace GloSS:通过全局毒性子空间抑制LLM中的毒性生成。 large language model
50 From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora 利用多路平行语料提升多语言大语言模型跨语言语义理解能力 large language model
51 FlashThink: An Early Exit Method For Efficient Reasoning FlashThink:一种用于高效推理的提前退出方法 large language model
52 EEG-to-Text Translation: A Model for Deciphering Human Brain Activity 提出R1 Translator模型,提升脑电信号到文本的解码性能 large language model
53 ConspEmoLLM-v2: A robust and stable model to detect sentiment-transformed conspiracy theories ConspEmoLLM-v2:提出一种鲁棒稳定的模型,用于检测情感转换后的阴谋论。 large language model
54 Concept Incongruence: An Exploration of Time and Death in Role Playing 探索角色扮演中时间与死亡的概念不一致性,揭示LLM的潜在问题 large language model
55 Incorporating Token Usage into Prompting Strategy Evaluation 提出Big-$O_{tok}$框架,评估提示策略的token使用效率,优化大语言模型应用。 large language model
56 SEPS: A Separability Measure for Robust Unlearning in LLMs 提出SEPS评估框架与MP混合提示学习,提升LLM在混合查询场景下的不可学习能力 large language model
57 Tracing Multilingual Factual Knowledge Acquisition in Pretraining 追踪预训练中多语言事实知识的获取过程,揭示频率驱动学习和跨语言迁移两种机制。 large language model
58 Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes 系统研究推理语言模型中的语言混合现象,揭示其模式、影响和内在原因。 chain-of-thought
59 sudoLLM: On Multi-role Alignment of Language Models sudoLLM:提出一种多角色对齐框架,提升LLM在用户权限控制下的安全性。 large language model
60 TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring 提出TRATES框架,利用LLM和rubric进行特定写作特征的跨prompt作文评分 large language model
61 Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders 利用稀疏自编码器进行LLM解毒:打破不良Token large language model
62 MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance 提出MoMoE框架,用于可解释、跨社区的AI辅助在线内容审核。 large language model
63 Rank-K: Test-Time Reasoning for Listwise Reranking Rank-K:一种用于列表式重排序的测试时推理方法,提升难例查询效果。 large language model
64 From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning 研究指令调优LLM在空间推理中从模板到自然语言泛化的挑战 large language model
65 Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis 提出PhantomCircuit框架,通过知识电路分析解决LLM中的知识遮蔽问题 large language model
66 Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs 针对开源LLM的提示注入攻击研究及新型攻击方法 large language model
67 Dual Decomposition of Weights and Singular Value Low Rank Adaptation DuDe:基于权重分解和奇异值分解的低秩自适应方法,提升LLM微调的稳定性和知识迁移效率。 large language model
68 OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation OSoRA:一种输出维度和奇异值初始化的低秩自适应方法,用于高效微调大型语言模型。 large language model
69 Teaching Small Language Models to Learn Logic through Meta-Learning 通过元学习训练小语言模型学习逻辑推理能力 large language model
70 JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling JOLT-SQL:通过混淆感知噪声模式采样联合优化Text-to-SQL的损失函数。 large language model
71 Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs 提出针对语音LLM的通用声学对抗攻击,实现灵活控制 large language model
72 ThinkSwitcher: When to Think Hard, When to Think Fast 提出ThinkSwitcher,动态切换CoT推理模式以提升大语言模型效率 chain-of-thought
73 SlangDIT: Benchmarking LLMs in Interpretative Slang Translation 提出SlangDIT基准测试和SlangOWL模型,用于提升LLM在解释性俚语翻译中的性能。 large language model
74 The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models 提出轻量级架构改进以解决字符级理解问题 large language model
75 Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents 提出法律规则归纳任务与基准数据集,提升LLM从判例中发现法律原则的能力 large language model
76 MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations MultiHal:用于知识图谱 grounding 的 LLM 幻觉多语言评估数据集 large language model
77 BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks 提出BAR以解决复杂Minecraft任务中的推理问题 large language model
78 Enhancing LLMs via High-Knowledge Data Selection 提出高知识评分器HKS,提升LLM在知识密集型任务和通用理解任务上的性能。 large language model
79 Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation 揭示多模态检索增强生成系统中新的隐私漏洞,提出组合结构化提示攻击方法。 multimodal
80 Cross-Linguistic Transfer in Multilingual NLP: The Role of Language Families and Morphology 研究语言家族和形态学对多语言NLP跨语言迁移的影响 zero-shot transfer
81 Let's Verify Math Questions Step by Step 提出MathQ-Verify,用于验证数学问题有效性,提升数学QA数据质量。 large language model
82 PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks PandaGuard:系统性评估大型语言模型针对越狱攻击的安全防护能力 large language model
83 Improve Language Model and Brain Alignment via Associative Memory 通过结合联想记忆提升语言模型与人脑的对齐 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (24 篇)

#题目一句话要点标签🔗
84 Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models 提出MathIF基准,揭示大规模推理模型中推理能力与指令遵循间的权衡。 reinforcement learning large language model instruction following
85 Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning 提出CoT-Bridge以弥补思维链推理中的“跳跃”问题,提升模型性能。 reinforcement learning large language model chain-of-thought
86 Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models 针对医学VQA,研究基于强化学习微调的视觉-语言模型有效性 reinforcement learning large language model multimodal
87 FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation FuxiMT:面向中文的多语言机器翻译稀疏化大语言模型 curriculum learning large language model
88 Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning 提出Game-RL框架,利用可验证游戏数据提升视觉语言模型通用推理能力 reinforcement learning multimodal
89 Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation 探究基于推理轨迹的知识蒸馏中,可解释性轨迹与最终结果之间的脱节现象 distillation chain-of-thought
90 Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning 提出Mujica-MyGo框架,通过多智能体RAG和极简强化学习解决LLM长上下文推理问题。 reinforcement learning large language model
91 Reward Reasoning Model 提出奖励推理模型(RRM),利用推理过程提升奖励模型性能。 reinforcement learning large language model chain-of-thought
92 General-Reasoner: Advancing LLM Reasoning Across All Domains 提出General-Reasoner,提升LLM在多领域推理能力,解决数据稀缺和答案多样性问题 reinforcement learning large language model chain-of-thought
93 Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning 提出Context Reasoner,通过强化学习提升LLM在安全隐私合规方面的上下文推理能力。 reinforcement learning large language model
94 Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents AgentGhost:揭示多模态大语言模型驱动的移动GUI代理中的后门漏洞 contrastive learning large language model multimodal
95 Improved Methods for Model Pruning and Knowledge Distillation 提出MAMA Pruning,一种改进的模型剪枝与知识蒸馏方法,提升大语言模型性能。 distillation large language model
96 InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion InfiGFusion提出一种基于logits图蒸馏的高效Gromov-Wasserstein模型融合方法,提升模型融合质量。 distillation large language model
97 KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation KORGym:一个用于评估大语言模型推理能力的动态游戏平台 reinforcement learning large language model
98 Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation 提出Log-Augmented Generation,通过复用历史计算提升LLM在测试时的推理能力。 distillation large language model
99 Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency 提出LAPS-SD算法,通过半先知调度优化推测解码中的LLM推理延迟 SSM large language model
100 Think-J: Learning to Think for Generative LLM-as-a-Judge Think-J:通过学习思考提升生成式LLM作为评判者的能力 reinforcement learning large language model
101 Think Only When You Need with Large Hybrid-Reasoning Models 提出大型混合推理模型(LHRMs),自适应地决定是否进行推理以提升效率。 reinforcement learning large language model
102 Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning 提出Prune-on-Logic框架,通过剪枝提升Long-CoT在小模型上的推理能力 distillation chain-of-thought
103 Adapting Pretrained Language Models for Citation Classification via Self-Supervised Contrastive Learning 提出Citss框架,通过自监督对比学习提升预训练语言模型在引文分类任务上的性能。 contrastive learning
104 Not All Correct Answers Are Equal: Why Your Distillation Source Matters 高质量蒸馏数据至关重要:教师模型选择影响大模型推理能力 distillation
105 FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning FAID:利用多任务辅助和多层次对比学习进行细粒度AI生成文本检测 contrastive learning
106 DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models 提出DRP:一种基于技能感知步骤分解的推理蒸馏剪枝方法,用于提升大型推理模型的效率。 distillation chain-of-thought
107 Truth or Twist? Optimal Model Selection for Reliable Label Flipping Evaluation in LLM-based Counterfactuals 针对LLM生成对抗样本,提出基于独立Judge模型进行可靠性评估的模型选择方法 distillation large language model

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
108 Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models 提出基于思维链的对抗场景外推方法,提升语言模型鲁棒性与流畅性 ASE large language model chain-of-thought

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
109 Not Minds, but Signs: Reframing LLMs through Semiotics 通过符号学视角重构LLM:关注符号操纵而非认知模拟 manipulation large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页