cs.CL（2025-05-20）

📊 共 109 篇论文 | 🔗 15 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (83 🔗12) 支柱二：RL算法与架构 (RL & Architecture) (24 🔗3) 支柱八：物理动画 (Physics-based Animation) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (83 篇)

#	题目	一句话要点	标签	🔗
1	Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs	揭示LLM内部思维链：层级化子任务调度机制的实证研究	large language model chain-of-thought
2	ABBA-Adapters: Efficient and Expressive Fine-Tuning of Foundation Models	提出ABBA-Adapters，通过高效且富有表现力的微调方法提升基础模型性能。	large language model foundation model	✅
3	EfficientLLM: Efficiency in Large Language Models	EfficientLLM：大规模语言模型效率评估基准与优化技术综合研究	large language model foundation model
4	ModRWKV: Transformer Multimodality in Linear Time	提出ModRWKV，一种基于RWKV7的线性时间复杂度多模态Transformer框架。	large language model multimodal
5	Enhanced Multimodal Aspect-Based Sentiment Analysis by LLM-Generated Rationales	提出LRSA框架，利用LLM生成的原因解释增强SLM在多模态情感分析中的性能。	large language model multimodal
6	CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring	提出CAFES框架以解决多模态自动评分的局限性	large language model multimodal
7	DecIF: Improving Instruction-Following through Meta-Decomposition	DecIF：通过元分解提升大型语言模型的指令跟随能力	large language model instruction following
8	Large Language Models Implicitly Learn to See and Hear Just By Reading	仅通过阅读文本，大语言模型隐式学习视觉和听觉能力	large language model
9	Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models	提出稀疏增强张量网络Saten，用于大语言模型后训练压缩。	large language model
10	Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning	提出N-rep一致性方法，无需CoT或微调，实现低成本高鲁棒性的Text-to-SQL	chain-of-thought
11	Scaling Laws for State Dynamics in Large Language Models	研究揭示大语言模型在状态动态建模中面临的挑战，并探究其内部状态追踪机制。	large language model
12	Toward Reliable Scientific Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models	提出TruthHypo基准和KnowHD检测器，评估LLM生成科学假设的真实性和幻觉问题。	large language model	✅
13	Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations	揭示代码混合扰动下大语言模型归因安全性失效问题，并提出修复策略。	large language model
14	Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models	揭示大语言模型跨尺度参数知识迁移的神经不兼容性难题	large language model	✅
15	DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models	DiagnosisArena：构建诊断推理基准，评估大型语言模型在医疗诊断中的能力。	large language model	✅
16	Development and Validation of Engagement and Rapport Scales for Evaluating User Experience in Multimodal Dialogue Systems	为多模态对话系统用户体验评估，开发并验证了交互投入度和亲和度量表	multimodal
17	Multimodal Cultural Safety: Evaluation Framework and Alignment Strategies	提出CROSS基准与CROSS-Eval框架，提升LVLM文化安全意识与合规性	multimodal
18	DECASTE: Unveiling Caste Stereotypes in Large Language Models through Multi-Dimensional Bias Analysis	提出DECASTE框架以揭示大型语言模型中的种姓偏见	large language model
19	Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples	提出LISTEN方法，通过合成负样本缓解音频大语言模型中的幻觉问题	large language model
20	S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models	S2SBench：用于量化语音到语音大语言模型智能退化的基准测试	large language model	✅
21	OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking	OmniGenBench：用于基因组基础模型可复现基准测试的模块化平台	foundation model
22	QA-prompting: Improving Summarization with Large Language Models using Question-Answering	提出QA-prompting方法，利用问答提升大语言模型长文本摘要能力	large language model
23	Cross-Lingual Optimization for Language Transfer in Large Language Models	提出跨语言优化(CLO)方法，提升大语言模型跨语言迁移能力并保持英语性能	large language model
24	Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification	构建统一框架，探索大语言模型在作者身份隐私保护中的混淆、模仿与验证作用	large language model
25	Beyond Chains: Bridging Large Language Models and Knowledge Bases in Complex Question Answering	提出PDRR框架，弥合大语言模型与知识库在复杂问答中的鸿沟	large language model
26	ShieldVLM: Safeguarding the Multimodal Implicit Toxicity via Deliberative Reasoning with LVLMs	提出ShieldVLM，通过审议推理增强LVLM在多模态隐式毒性检测中的安全性。	multimodal
27	AUTOLAW: Enhancing Legal Compliance in Large Language Models via Case Law Generation and Jury-Inspired Deliberation	AutoLaw：通过案例生成与陪审团审议增强大语言模型法律合规性	large language model
28	Activation-Guided Consensus Merging for Large Language Models	提出激活引导的共识合并方法ACM，提升大语言模型合并效果。	large language model
29	Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection	研究多模态情感识别中模型预测分歧，揭示模态冲突下的潜在歧义。	multimodal
30	Informatics for Food Processing	提出FoodProX和多模态AI模型，提升食品加工评估的客观性和可扩展性	large language model multimodal
31	Amadeus-Verbo Technical Report: The powerful Qwen2.5 family models trained in Portuguese	Amadeus-Verbo：针对巴西葡萄牙语的Qwen2.5系列大语言模型微调与开源	large language model foundation model	✅
32	PersonaTAB: Predicting Personality Traits using Textual, Acoustic, and Behavioral Cues in Fully-Duplex Speech Dialogs	PersonaTAB：利用全双工语音对话中的文本、声学和行为线索预测人格特质	large language model TAMP
33	Self-Reasoning Language Models: Unfold Hidden Reasoning Chains with Few Reasoning Catalyst	提出自推理语言模型（SRLM），通过少量推理催化剂迭代提升复杂推理能力。	large language model chain-of-thought
34	Mapping the Minds of LLMs: A Graph-Based Analysis of Reasoning LLM	提出图基分析框架以提升推理大型语言模型的理解	large language model chain-of-thought
35	Too Long, Didn't Model: Decomposing LLM Long-Context Understanding With Novels	提出TLDM基准，揭示LLM在长文本小说理解中超过64k tokens后性能显著下降	large language model
36	EasyMath: A 0-shot Math Benchmark for SLMs	EasyMath：面向小型语言模型的零样本数学推理评测基准	chain-of-thought
37	Automated Journalistic Questions: A New Method for Extracting 5W1H in French	提出法语新闻5W1H自动抽取流程，性能媲美GPT-4o。	large language model
38	UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Language Models	UltraEdit：一种免训练、免主题、免记忆的语言模型终身编辑方法	large language model	✅
39	WirelessMathBench: A Mathematical Modeling Benchmark for LLMs in Wireless Communications	WirelessMathBench：无线通信领域大语言模型数学建模能力评测基准	large language model
40	Temporal Alignment of Time Sensitive Facts with Activation Engineering	利用激活工程实现LLM的时间敏感事实对齐，无需训练即可提升时间感知能力。	large language model
41	Through a Compressed Lens: Investigating The Impact of Quantization on Factual Knowledge Recall	研究量化对大语言模型事实知识回忆的影响，揭示量化引入的信息损失。	large language model
42	Mechanistic Interpretability of GPT-like Models on Summarization Tasks	提出一种针对GPT类模型在摘要任务上的可解释性分析框架，并实现性能提升。	large language model
43	WebNovelBench: Placing LLM Novelists on the Web Novel Distribution	提出WebNovelBench，用于评估LLM在长文本小说生成中的能力，并将其置于真实网络小说分布中进行对比。	large language model
44	Creative Preference Optimization	提出创造性偏好优化(CrPO)，提升大语言模型生成内容的新颖性、多样性和质量。	large language model
45	MUG-Eval: A Proxy Evaluation Framework for Multilingual Generation Capabilities in Any Language	MUG-Eval：提出一种与语言无关的代理评估框架，用于评估任意语言的大语言模型生成能力	large language model
46	GemMaroc: Unlocking Darija Proficiency in LLMs with Minimal Data	GemMaroc：利用少量数据提升LLM在摩洛哥阿拉伯语（Darija）上的能力	large language model
47	Tokenization Constraints in LLMs: A Study of Symbolic and Arithmetic Reasoning Limits	揭示LLM中Token化约束对符号和算术推理的限制，提出Token Awareness概念。	chain-of-thought
48	A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations	PersonaConvBench：提出一个大规模个性化对话基准，用于评估LLM在多轮对话中的推理和生成能力。	large language model
49	GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace	GloSS：通过全局毒性子空间抑制LLM中的毒性生成。	large language model
50	From Unaligned to Aligned: Scaling Multilingual LLMs with Multi-Way Parallel Corpora	利用多路平行语料提升多语言大语言模型跨语言语义理解能力	large language model
51	FlashThink: An Early Exit Method For Efficient Reasoning	FlashThink：一种用于高效推理的提前退出方法	large language model
52	EEG-to-Text Translation: A Model for Deciphering Human Brain Activity	提出R1 Translator模型，提升脑电信号到文本的解码性能	large language model	✅
53	ConspEmoLLM-v2: A robust and stable model to detect sentiment-transformed conspiracy theories	ConspEmoLLM-v2：提出一种鲁棒稳定的模型，用于检测情感转换后的阴谋论。	large language model	✅
54	Concept Incongruence: An Exploration of Time and Death in Role Playing	探索角色扮演中时间与死亡的概念不一致性，揭示LLM的潜在问题	large language model
55	Incorporating Token Usage into Prompting Strategy Evaluation	提出Big-$O_{tok}$框架，评估提示策略的token使用效率，优化大语言模型应用。	large language model
56	SEPS: A Separability Measure for Robust Unlearning in LLMs	提出SEPS评估框架与MP混合提示学习，提升LLM在混合查询场景下的不可学习能力	large language model
57	Tracing Multilingual Factual Knowledge Acquisition in Pretraining	追踪预训练中多语言事实知识的获取过程，揭示频率驱动学习和跨语言迁移两种机制。	large language model	✅
58	Language Mixing in Reasoning Language Models: Patterns, Impact, and Internal Causes	系统研究推理语言模型中的语言混合现象，揭示其模式、影响和内在原因。	chain-of-thought
59	sudoLLM: On Multi-role Alignment of Language Models	sudoLLM：提出一种多角色对齐框架，提升LLM在用户权限控制下的安全性。	large language model
60	TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring	提出TRATES框架，利用LLM和rubric进行特定写作特征的跨prompt作文评分	large language model
61	Breaking Bad Tokens: Detoxification of LLMs Using Sparse Autoencoders	利用稀疏自编码器进行LLM解毒：打破不良Token	large language model
62	MoMoE: Mixture of Moderation Experts Framework for AI-Assisted Online Governance	提出MoMoE框架，用于可解释、跨社区的AI辅助在线内容审核。	large language model
63	Rank-K: Test-Time Reasoning for Listwise Reranking	Rank-K：一种用于列表式重排序的测试时推理方法，提升难例查询效果。	large language model
64	From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning	研究指令调优LLM在空间推理中从模板到自然语言泛化的挑战	large language model
65	Pierce the Mists, Greet the Sky: Decipher Knowledge Overshadowing via Knowledge Circuit Analysis	提出PhantomCircuit框架，通过知识电路分析解决LLM中的知识遮蔽问题	large language model
66	Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs	针对开源LLM的提示注入攻击研究及新型攻击方法	large language model
67	Dual Decomposition of Weights and Singular Value Low Rank Adaptation	DuDe：基于权重分解和奇异值分解的低秩自适应方法，提升LLM微调的稳定性和知识迁移效率。	large language model
68	OSoRA: Output-Dimension and Singular-Value Initialized Low-Rank Adaptation	OSoRA：一种输出维度和奇异值初始化的低秩自适应方法，用于高效微调大型语言模型。	large language model
69	Teaching Small Language Models to Learn Logic through Meta-Learning	通过元学习训练小语言模型学习逻辑推理能力	large language model
70	JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling	JOLT-SQL：通过混淆感知噪声模式采样联合优化Text-to-SQL的损失函数。	large language model	✅
71	Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs	提出针对语音LLM的通用声学对抗攻击，实现灵活控制	large language model
72	ThinkSwitcher: When to Think Hard, When to Think Fast	提出ThinkSwitcher，动态切换CoT推理模式以提升大语言模型效率	chain-of-thought
73	SlangDIT: Benchmarking LLMs in Interpretative Slang Translation	提出SlangDIT基准测试和SlangOWL模型，用于提升LLM在解释性俚语翻译中的性能。	large language model
74	The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models	提出轻量级架构改进以解决字符级理解问题	large language model
75	Legal Rule Induction: Towards Generalizable Principle Discovery from Analogous Judicial Precedents	提出法律规则归纳任务与基准数据集，提升LLM从判例中发现法律原则的能力	large language model
76	MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluation of LLM Hallucinations	MultiHal：用于知识图谱 grounding 的 LLM 幻觉多语言评估数据集	large language model
77	BAR: A Backward Reasoning based Agent for Complex Minecraft Tasks	提出BAR以解决复杂Minecraft任务中的推理问题	large language model
78	Enhancing LLMs via High-Knowledge Data Selection	提出高知识评分器HKS，提升LLM在知识密集型任务和通用理解任务上的性能。	large language model
79	Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation	揭示多模态检索增强生成系统中新的隐私漏洞，提出组合结构化提示攻击方法。	multimodal
80	Cross-Linguistic Transfer in Multilingual NLP: The Role of Language Families and Morphology	研究语言家族和形态学对多语言NLP跨语言迁移的影响	zero-shot transfer
81	Let's Verify Math Questions Step by Step	提出MathQ-Verify，用于验证数学问题有效性，提升数学QA数据质量。	large language model	✅
82	PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks	PandaGuard：系统性评估大型语言模型针对越狱攻击的安全防护能力	large language model
83	Improve Language Model and Brain Alignment via Associative Memory	通过结合联想记忆提升语言模型与人脑的对齐	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (24 篇)

#	题目	一句话要点	标签	🔗
84	Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models	提出MathIF基准，揭示大规模推理模型中推理能力与指令遵循间的权衡。	reinforcement learning large language model instruction following	✅
85	Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning	提出CoT-Bridge以弥补思维链推理中的“跳跃”问题，提升模型性能。	reinforcement learning large language model chain-of-thought
86	Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models	针对医学VQA，研究基于强化学习微调的视觉-语言模型有效性	reinforcement learning large language model multimodal
87	FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation	FuxiMT：面向中文的多语言机器翻译稀疏化大语言模型	curriculum learning large language model
88	Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning	提出Game-RL框架，利用可验证游戏数据提升视觉语言模型通用推理能力	reinforcement learning multimodal
89	Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation	探究基于推理轨迹的知识蒸馏中，可解释性轨迹与最终结果之间的脱节现象	distillation chain-of-thought
90	Advancing Multi-Agent RAG Systems with Minimalist Reinforcement Learning	提出Mujica-MyGo框架，通过多智能体RAG和极简强化学习解决LLM长上下文推理问题。	reinforcement learning large language model
91	Reward Reasoning Model	提出奖励推理模型(RRM)，利用推理过程提升奖励模型性能。	reinforcement learning large language model chain-of-thought	✅
92	General-Reasoner: Advancing LLM Reasoning Across All Domains	提出General-Reasoner，提升LLM在多领域推理能力，解决数据稀缺和答案多样性问题	reinforcement learning large language model chain-of-thought
93	Context Reasoner: Incentivizing Reasoning Capability for Contextualized Privacy and Safety Compliance via Reinforcement Learning	提出Context Reasoner，通过强化学习提升LLM在安全隐私合规方面的上下文推理能力。	reinforcement learning large language model
94	Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents	AgentGhost：揭示多模态大语言模型驱动的移动GUI代理中的后门漏洞	contrastive learning large language model multimodal
95	Improved Methods for Model Pruning and Knowledge Distillation	提出MAMA Pruning，一种改进的模型剪枝与知识蒸馏方法，提升大语言模型性能。	distillation large language model
96	InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion	InfiGFusion提出一种基于logits图蒸馏的高效Gromov-Wasserstein模型融合方法，提升模型融合质量。	distillation large language model
97	KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation	KORGym：一个用于评估大语言模型推理能力的动态游戏平台	reinforcement learning large language model
98	Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation	提出Log-Augmented Generation，通过复用历史计算提升LLM在测试时的推理能力。	distillation large language model
99	Semi-Clairvoyant Scheduling of Speculative Decoding Requests to Minimize LLM Inference Latency	提出LAPS-SD算法，通过半先知调度优化推测解码中的LLM推理延迟	SSM large language model
100	Think-J: Learning to Think for Generative LLM-as-a-Judge	Think-J：通过学习思考提升生成式LLM作为评判者的能力	reinforcement learning large language model
101	Think Only When You Need with Large Hybrid-Reasoning Models	提出大型混合推理模型(LHRMs)，自适应地决定是否进行推理以提升效率。	reinforcement learning large language model
102	Can Pruning Improve Reasoning? Revisiting Long-CoT Compression with Capability in Mind for Better Reasoning	提出Prune-on-Logic框架，通过剪枝提升Long-CoT在小模型上的推理能力	distillation chain-of-thought
103	Adapting Pretrained Language Models for Citation Classification via Self-Supervised Contrastive Learning	提出Citss框架，通过自监督对比学习提升预训练语言模型在引文分类任务上的性能。	contrastive learning
104	Not All Correct Answers Are Equal: Why Your Distillation Source Matters	高质量蒸馏数据至关重要：教师模型选择影响大模型推理能力	distillation	✅
105	FAID: Fine-Grained AI-Generated Text Detection Using Multi-Task Auxiliary and Multi-Level Contrastive Learning	FAID：利用多任务辅助和多层次对比学习进行细粒度AI生成文本检测	contrastive learning
106	DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models	提出DRP：一种基于技能感知步骤分解的推理蒸馏剪枝方法，用于提升大型推理模型的效率。	distillation chain-of-thought
107	Truth or Twist? Optimal Model Selection for Reliable Label Flipping Evaluation in LLM-based Counterfactuals	针对LLM生成对抗样本，提出基于独立Judge模型进行可靠性评估的模型选择方法	distillation large language model

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
108	Chain-of-Thought Driven Adversarial Scenario Extrapolation for Robust Language Models	提出基于思维链的对抗场景外推方法，提升语言模型鲁棒性与流畅性	ASE large language model chain-of-thought

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
109	Not Minds, but Signs: Reframing LLMs through Semiotics	通过符号学视角重构LLM：关注符号操纵而非认知模拟	manipulation large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页

cs.CL（2025-05-20）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (83 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (24 篇)

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理