cs.CL(2025-09-26)

📊 共 71 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (55 🔗9) 支柱二:RL算法与架构 (RL & Architecture) (14 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (55 篇)

#题目一句话要点标签🔗
1 CHRONOBERG: Capturing Language Evolution and Temporal Awareness in Foundation Models CHRONOBERG:构建时序语料库,提升大语言模型对语言演变和时间感知的理解 large language model foundation model
2 Thinking in Many Modes: How Composite Reasoning Elevates Large Language Model Performance with Limited Data 提出复合推理(CR)方法,提升大语言模型在少样本下的复杂问题求解能力 large language model chain-of-thought
3 R-Capsule: Compressing High-Level Plans for Efficient Large Language Model Reasoning 提出R-Capsule框架以提高大语言模型推理效率 large language model chain-of-thought
4 Why Chain of Thought Fails in Clinical Text Understanding 大规模实验揭示思维链(CoT)提示在临床文本理解中失效的现象与原因 large language model chain-of-thought
5 Large language models management of medications: three performance analyses 评估大型语言模型在药物管理任务中的表现,揭示其在药物推荐方面的局限性 large language model
6 Evaluating Uncertainty Quantification Methods in Argumentative Large Language Models 评估论证型大语言模型中不确定性量化方法的有效性 large language model
7 Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning 评估大语言模型在多语言法律推理中的局限性 large language model
8 Detecting (Un)answerability in Large Language Models with Linear Directions 利用线性方向检测大型语言模型在抽取式问答中的不可回答性 large language model
9 Exploratory Semantic Reliability Analysis of Wind Turbine Maintenance Logs using Large Language Models 利用大型语言模型进行风力涡轮机维护日志的探索性语义可靠性分析 large language model
10 The Outputs of Large Language Models are Meaningless 论证大型语言模型输出无意义,挑战现有语义理解 large language model
11 From Long to Lean: Performance-aware and Adaptive Chain-of-Thought Compression via Multi-round Refinement 提出MACC框架,通过多轮细化自适应压缩CoT,提升推理效率与准确率。 chain-of-thought
12 FoodSEM: Large Language Model Specialized in Food Named-Entity Linking FoodSEM:针对食品命名实体链接的专用大型语言模型 large language model
13 Evaluating Open-Source Large Language Models for Technical Telecom Question Answering 评估开源大语言模型在电信技术问答中的性能 large language model
14 Debiasing Large Language Models in Thai Political Stance Detection via Counterfactual Calibration 提出ThaiFACTUAL框架,解决泰语政治立场检测中大语言模型的偏见问题 large language model
15 SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models 提出SBFA:单比特翻转攻击破解大语言模型,揭示严重安全隐患 large language model
16 Quantifying the Impact of Structured Output Format on Large Language Models through Causal Inference 利用因果推断量化结构化输出格式对大语言模型的影响 large language model
17 ADAM: A Diverse Archive of Mankind for Evaluating and Enhancing LLMs in Biographical Reasoning 提出ADAM框架,用于评估和提升LLM在人物传记推理中的能力 large language model multimodal
18 VoiceAssistant-Eval: Benchmarking AI Assistants across Listening, Speaking, and Viewing 提出VoiceAssistant-Eval以评估多模态AI助手的能力 large language model multimodal
19 RedNote-Vibe: A Dataset for Capturing Temporal Dynamics of AI-Generated Text in Social Media RedNote-Vibe:一个用于捕捉社交媒体中AI生成文本时序动态的数据集 large language model TAMP
20 Human Mobility Datasets Enriched With Contextual and Social Dimensions 提出一种结合上下文、社交维度和LLM生成数据的城市人类移动数据集构建框架。 large language model multimodal
21 AI Brown and AI Koditex: LLM-Generated Corpora Comparable to Traditional Corpora of English and Czech Texts 提出AI Brown和AI Koditex:可与传统语料库媲美的LLM生成英文和捷克文语料库 large language model
22 Can Prompts Rewind Time for LLMs? Evaluating the Effectiveness of Prompted Knowledge Cutoffs 提示工程能否使LLM回溯时间?评估提示知识截断的有效性 large language model
23 The Bias is in the Details: An Assessment of Cognitive Bias in LLMs 评估LLM认知偏差:揭示模型在决策中存在的系统性偏差 large language model
24 Towards Generalizable Implicit In-Context Learning with Attention Routing 提出In-Context Routing (ICR)方法,提升隐式上下文学习的泛化能力。 large language model
25 ArabJobs: A Multinational Corpus of Arabic Job Ads ArabJobs:一个多国阿拉伯语招聘广告语料库,用于公平的阿拉伯语NLP和劳动力市场研究。 large language model
26 InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models InfiR2:面向推理增强语言模型的全面FP8训练方案 large language model
27 We Think, Therefore We Align LLMs to Helpful, Harmless and Honest Before They Go Wrong 提出自适应多分支引导(AMBS)框架,提升LLM在HHH目标上的对齐效果 large language model
28 Representing LLMs in Prompt Semantic Task Space 提出一种免训练方法,将LLM表示为提示语义任务空间中的线性算子,用于模型选择。 large language model
29 FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory 提出FormalML基准,评估LLM在机器学习理论中形式化子目标补全能力 large language model
30 What Is The Political Content in LLMs' Pre- and Post-Training Data? 分析LLM训练数据中的政治倾向,揭示模型偏见与数据内容的相关性 large language model
31 The InviTE Corpus: Annotating Invectives in Tudor English Texts for Computational Modeling 构建InviTE语料库,用于计算建模都铎英语文本中的宗教谩骂 large language model
32 Transformers Can Learn Connectivity in Some Graphs but Not Others 研究表明Transformer在网格状图上学习连通性,但在复杂图上存在困难 large language model
33 Beyond Textual Context: Structural Graph Encoding with Adaptive Space Alignment to alleviate the hallucination of LLMs 提出SSKG-LLM,通过结构化图编码和自适应空间对齐缓解LLM幻觉问题 large language model
34 Safety Compliance: Rethinking LLM Safety Reasoning through the Lens of Compliance 提出Safety Compliance框架,通过法律合规视角提升LLM安全性 large language model
35 FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction FLEXI:首个全双工人机语音交互评测基准,关注紧急场景中断 large language model
36 FeatBench: Evaluating Coding Agents on Feature Implementation for Vibe Coding FeatBench:用于评估Vibe Coding中特征实现的编码智能体性能的基准测试。 large language model
37 Question-Driven Analysis and Synthesis: Building Interpretable Thematic Trees with LLMs for Text Clustering and Controllable Generation 提出递归主题划分(RTP),利用LLM构建可解释主题树,实现文本聚类和可控生成。 large language model
38 Library Hallucinations in LLMs: Risk Analysis Grounded in Developer Queries 系统性分析LLM代码生成中因开发者查询引起的库幻觉风险 large language model
39 Mixture of Detectors: A Compact View of Machine-Generated Text Detection 提出混合检测器框架BMAS English,用于全面评估机器生成文本检测 large language model
40 Universal Legal Article Prediction via Tight Collaboration between Supervised Classification Model and LLM 提出Uni-LAP框架,通过监督分类模型与LLM紧密协作,实现通用法律条文预测。 large language model
41 Think Right, Not More: Test-Time Scaling for Numerical Claim Verification 提出VERIFIERFC,通过测试时缩放提升LLM在数值声明验证中的性能 large language model
42 COSPADI: Compressing LLMs via Calibration-Guided Sparse Dictionary Learning 提出CoSpaDi,通过校准引导的稀疏字典学习压缩LLM,提升压缩性能。 large language model
43 Fine-tuning Done Right in Model Editing 重塑微调在模型编辑中的地位:提出LocFT-BF大幅超越现有方法 large language model
44 Speak Your Mind: The Speech Continuation Task as a Probe of Voice-Based Model Bias 提出语音延续任务,用于探测语音模型中基于声音的偏见 foundation model
45 Black-Box Hallucination Detection via Consistency Under the Uncertain Expression 提出基于不确定性表达一致性的黑盒方法,用于检测大型语言模型中的幻觉问题。 large language model
46 MotivGraph-SoIQ: Integrating Motivational Knowledge Graphs and Socratic Dialogue for Enhanced LLM Ideation 提出MotivGraph-SoIQ以解决LLM创意过程中的偏见与基础不足问题 large language model
47 SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation SimulSense:通过感知驱动的口译实现高效同声语音翻译 large language model
48 A Large-Scale Dataset and Citation Intent Classification in Turkish with LLMs 提出土耳其语引文意图分类数据集和基于DSPy优化的LLM分类框架 large language model
49 AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans AgentPack:一个由智能体与人类共同编写的代码变更数据集,用于提升代码编辑模型性能。 large language model
50 LUMINA: Detecting Hallucinations in RAG System with Context-Knowledge Signals LUMINA:利用上下文-知识信号检测RAG系统中的幻觉问题 large language model
51 Enhancing Low-Rank Adaptation with Structured Nonlinear Transformations LoRAN:通过结构化非线性变换增强低秩自适应能力 large language model
52 What Makes LLM Agent Simulations Useful for Policy? Insights From an Iterative Design Engagement in Emergency Preparedness 利用LLM Agent模拟提升应急预案有效性:一个迭代设计案例研究 large language model
53 Following the TRACE: A Structured Path to Empathetic Response Generation with Multi-Agent Models 提出TRACE框架,通过多智能体模型分解任务,提升共情回复生成质量。 large language model
54 Collaborative and Proactive Management of Task-Oriented Conversations 提出一种基于信息状态的协作式任务导向对话管理模型,提升对话成功率。 large language model
55 Can LLMs Solve and Generate Linguistic Olympiad Puzzles? 利用大型语言模型解决并生成语言学奥林匹克竞赛题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
56 EditGRPO: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation EditGRPO:利用后验编辑的强化学习提升胸部X光报告的临床准确性 reinforcement learning large language model multimodal
57 Exploring Solution Divergence and Its Effect on Large Language Model Problem Solving 提出基于解空间差异性的LLM训练与评估方法,提升问题解决能力 reinforcement learning large language model
58 AutoSCORE: Enhancing Automated Scoring with Multi-Agent Large Language Models via Structured Component Recognition AutoSCORE:利用结构化组件识别和多Agent LLM增强自动评分 MAE large language model
59 QoNext: Towards Next-generation QoE for Foundation Models QoNext:面向大模型交互体验的下一代QoE评估框架 predictive model foundation model
60 ResT: Reshaping Token-Level Policy Gradients for Tool-Use Large Language Models ResT:重塑Token级策略梯度,提升LLM工具使用能力 reinforcement learning large language model
61 WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning WebGen-Agent:通过多层次反馈和步级强化学习增强交互式网站生成 reinforcement learning large language model
62 No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping 提出RL-ZVP算法,利用大语言模型强化学习中零方差提示提升数学推理能力 reinforcement learning large language model
63 Critique-Coder: Enhancing Coder Models by Critique Reinforcement Learning 提出Critique-Coder,通过批判强化学习提升代码生成模型性能 reinforcement learning distillation
64 ML2B: Multi-Lingual ML Benchmark For AutoML 提出ML2B多语言机器学习基准,评估AutoML模型跨语言代码生成能力。 representation learning large language model
65 When Does Reasoning Matter? A Controlled Study of Reasoning's Contribution to Model Performance 研究推理能力对大语言模型性能的贡献,揭示其在不同任务和模型规模下的有效性。 distillation large language model
66 Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding 提出Group Tree Optimization,解决推测解码中草稿策略不对齐问题,提升LLM推理速度。 PPO large language model
67 S2J: Bridging the Gap Between Solving and Judging Ability in Generative Reward Models 提出S2J方法,弥合生成式奖励模型中求解能力与判断能力之间的差距 distillation large language model
68 Evaluating and Improving Cultural Awareness of Reward Models for LLM Alignment 提出CARB基准评估并改进LLM奖励模型中的文化感知能力 reinforcement learning large language model
69 StateX: Enhancing RNN Recall via Post-training State Expansion StateX:通过后训练状态扩展增强RNN的召回能力 state space model linear attention

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
70 Towards Minimal Causal Representations for Human Multimodal Language Understanding 提出CaMIB模型,利用因果推断提升多模态语言理解的泛化能力 HuMoR multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
71 ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents ChatInject:利用聊天模板在LLM Agent中进行提示注入攻击 manipulation large language model instruction following

⬅️ 返回 cs.CL 首页 · 🏠 返回主页