cs.CL(2025-10-28)

📊 共 52 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (38 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (11 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (38 篇)

#题目一句话要点标签🔗
1 ProofSketch: Efficient Verified Reasoning for Large Language Models ProofSketch:一种高效的、可验证的大语言模型推理框架 large language model chain-of-thought
2 MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations 提出MuSaG:一个带完整模态标注的德语多模态讽刺数据集,用于提升讽刺检测模型性能。 large language model multimodal
3 Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation 提出MiRAGE框架,用于评估多模态检索增强生成系统的性能 multimodal
4 Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment 提出基于潜在语义对齐的跨尺度知识迁移方法,提升大语言模型性能 large language model
5 Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems 综述性研究:面向应用,探讨RAG、推理和Agentic系统缓解大语言模型幻觉问题 large language model
6 POWSM: A Phonetic Open Whisper-Style Speech Foundation Model 提出POWSM:一个语音开放Whisper风格的语音基础模型,统一解决多种语音音素相关任务 foundation model
7 Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish 提出基于语法书指导的评测框架,评估大语言模型对卢森堡语语法的理解能力 large language model
8 Abjad AI at NADI 2025: CATT-Whisper: Multimodal Diacritic Restoration Using Text and Speech Representations CATT-Whisper:利用文本和语音表征的多模态阿拉伯语变音符恢复 multimodal
9 Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models? 探究大语言模型忠实性的驱动因素,提升医疗等敏感领域的可信度 large language model
10 TEXT2DB: Integration-Aware Information Extraction with Large Language Model Agents 提出TEXT2DB任务与OPAL框架,利用LLM Agent实现信息抽取与数据库集成。 large language model
11 Zero-Shot Cross-Lingual Transfer using Prefix-Based Adaptation 提出基于前缀的自适应方法,实现大语言模型零样本跨语言迁移。 large language model zero-shot transfer
12 Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants 提出开放式阿拉伯文化问答基准以解决方言变体问题 large language model chain-of-thought
13 Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers 训练极简注意力Transformer以解决间接对象识别任务,揭示核心推理电路 large language model
14 Idea2Plan: Exploring AI-Powered Research Planning Idea2Plan:探索AI驱动的科研规划能力,为自主科研智能体奠定基础 large language model
15 Tongyi DeepResearch Technical Report 提出 Tongyi DeepResearch,一个面向长程深度信息检索任务的 Agentic 大语言模型。 large language model
16 zFLoRA: Zero-Latency Fused Low-Rank Adapters 提出零延迟融合低秩适配器zFLoRA,解决LLM部署中适配器推理延迟问题 large language model
17 Towards a Method for Synthetic Generation of Persons with Aphasia Transcripts 提出两种方法合成生成失语症患者的语音转录文本,缓解数据稀缺问题。 large language model
18 A word association network methodology for evaluating implicit biases in LLMs compared to humans 提出一种基于词语联想网络的LLM内隐偏见评估方法,可与人类偏见直接对比。 large language model
19 Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices 针对欧洲语言LLM评测,提出新分类体系与最佳实践方案 large language model
20 Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content 提出基于Agent的框架,评估LLM在生成伊斯兰内容时的准确性和一致性 large language model
21 LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability 提出LongWeave基准,通过CoV-Eval评估LLM在真实场景下的长文本生成能力。 large language model
22 Ko-MuSR: A Multistep Soft Reasoning Benchmark for LLMs Capable of Understanding Korean 提出Ko-MuSR基准,用于评估LLM在理解韩语长文本叙事中的多步软推理能力 large language model
23 RiddleBench: A New Generative Reasoning Benchmark for LLMs RiddleBench:用于评估LLM生成式推理能力的新型基准测试 large language model
24 WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking WebLeaper:通过信息丰富的搜索,提升WebAgent的效率和效能 large language model
25 AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis AgentFrontier:利用ZPD引导的数据合成扩展LLM Agent的能力边界 large language model
26 STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence 提出STAR-Bench,用于评估模型在音频4D时空推理方面的能力。 large language model
27 "Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue 提出多模态模型,提升对话系统中他人发起修复请求的检测能力 multimodal
28 Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way 提出dLLM-Var,实现原生可变长度生成的扩散语言模型,显著提升推理速度。 large language model
29 ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization 提出ReForm,通过自反形式化与前瞻有界序列优化提升自然语言数学的形式化转换。 large language model
30 Open Korean Historical Corpus: A Millennia-Scale Diachronic Collection of Public Domain Texts 构建大规模开放韩语历史语料库,促进韩语历史变迁的量化研究 large language model
31 Parallel Loop Transformer for Efficient Test-Time Computation Scaling 提出并行循环Transformer(PLT),加速LLM测试时计算并降低内存占用 large language model
32 CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? CritiCal:利用自然语言评判提升大语言模型的不确定性与置信度校准 large language model
33 LuxIT: A Luxembourgish Instruction Tuning Dataset from Monolingual Seed Data LuxIT:一种基于单语种子数据的卢森堡语指令微调数据集 large language model
34 Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation 提出基于Agent驱动的LLM代码智能体评测基准PRDBench,解决标注成本高和评测指标单一问题。 large language model
35 Evaluating LLMs on Generating Age-Appropriate Child-Like Conversations 评估大型语言模型在生成适合儿童年龄段对话方面的能力 large language model
36 Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures 提出Global PIQA,用于评估大型语言模型在100+种语言和文化中的物理常识推理能力 large language model
37 Pie: A Programmable Serving System for Emerging LLM Applications Pie:一种可编程的LLM服务系统,为新兴应用提供灵活高效的支持 large language model
38 Success and Cost Elicit Convention Formation for Efficient Communication 提出基于成功和代价驱动的对话惯例形成方法,提升多模态通信效率。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
39 SelecTKD: Selective Token-Weighted Knowledge Distillation for LLMs SelecTKD:面向LLM的选择性Token加权知识蒸馏框架 teacher-student distillation large language model
40 SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens SemCoT:通过语义对齐的隐式令牌加速思维链推理 distillation chain-of-thought
41 Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards 提出LATR,通过前瞻树搜索提升可验证奖励强化学习中的轨迹探索能力 reinforcement learning policy learning large language model
42 Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation? 研究LLM能否将人类指令翻译为强化学习Agent的内部符号表征 reinforcement learning large language model
43 SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space SPARTA:通过文本自编码器隐空间中的黑盒对抗释义评估推理分割的鲁棒性 reinforcement learning large language model multimodal
44 OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning 提出OpenReward,通过强化学习训练工具增强的奖励模型,提升长文本Agent任务的奖励评估质量。 reinforcement learning large language model
45 Reinforcement Learning for Long-Horizon Multi-Turn Search Agents 提出基于强化学习的长程多轮搜索Agent,显著提升法律文档搜索精度。 reinforcement learning large language model
46 Evolving Diagnostic Agents in a Virtual Clinical Environment 提出基于强化学习的诊断智能体框架,提升LLM在虚拟临床环境中的诊断能力 reinforcement learning world model large language model
47 Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning 提出Critique-RL,通过双阶段强化学习训练用于评价的语言模型。 reinforcement learning
48 Teaching LLMs to Abstain via Fine-Grained Semantic Confidence Reward 提出细粒度语义置信度奖励,提升LLM的拒绝回答能力 reinforcement learning large language model
49 Optimizing Retrieval for RAG via Reinforcement Learning 提出R3框架,通过强化学习优化RAG检索器,提升AI推理性能。 reinforcement learning

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
50 ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games? ComboBench:评估LLM在VR游戏中操控物理设备的能力 manipulation large language model
51 Quantifying the Effects of Word Length, Frequency, and Predictability on Dyslexia 量化词长、词频和可预测性对阅读障碍的影响,为干预提供指导 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
52 Dark & Stormy: Modeling Humor in the Worst Sentences Ever Written 提出新方法以分析英语中的糟糕幽默 HuMoR

⬅️ 返回 cs.CL 首页 · 🏠 返回主页