cs.CL(2025-10-13)

📊 共 53 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (42 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗2) 支柱一:机器人控制 (Robot Control) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (42 篇)

#题目一句话要点标签🔗
1 A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning 提出A$^2$FM以解决推理与工具调用效率低下问题 large language model foundation model chain-of-thought
2 VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents 提出VCB Bench:一个用于评估语音驱动的大语言模型对话Agent的中文基准 large language model multimodal instruction following
3 Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations 提出基于多模态扰动的医学视觉-语言模型推理忠实性评估框架,用于胸部X光VQA。 multimodal chain-of-thought
4 Hallucination Detection via Internal States and Structured Reasoning Consistency in Large Language Models 提出HalluDet框架,通过内部状态和结构化推理一致性检测大语言模型幻觉 large language model chain-of-thought
5 Investigating Large Language Models' Linguistic Abilities for Text Preprocessing 利用大型语言模型进行文本预处理,提升下游文本分类任务性能 large language model
6 StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models 提出StoryBox,利用多智能体协同模拟实现混合自底向上长篇故事生成。 large language model
7 MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models MeTA-LoRA:一种数据高效的大语言模型多任务微调方法 large language model
8 Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models 系统性研究不同方法对LLM生成封闭式调查问卷的影响,并提出实用建议。 large language model
9 LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings 提出LLMAtKGE,利用大语言模型作为可解释的知识图谱嵌入对抗攻击器 large language model
10 Are Large Language Models Effective Knowledge Graph Constructors? 提出一种基于层级提取框架的知识图谱构建方法,提升LLM在知识密集型任务中的表现。 large language model
11 Do Psychometric Tests Work for Large Language Models? Evaluation of Tests on Sexism, Racism, and Morality 评估心理测量测试在大型语言模型中的有效性,揭示其在性别歧视、种族歧视和道德评估上的局限性。 large language model
12 Ensembling Large Language Models to Characterize Affective Dynamics in Student-AI Tutor Dialogues 提出集成大语言模型框架,用于分析学生-AI辅导对话中的情感动态 large language model
13 DND: Boosting Large Language Models with Dynamic Nested Depth DND:通过动态嵌套深度提升大型语言模型性能 large language model
14 Judge Before Answer: Can MLLM Discern the False Premise in Question? 提出JBA数据集与识别增强框架,提升多模态大语言模型对虚假前提的识别能力 large language model multimodal
15 UALM: Unified Audio Language Model for Understanding, Generation and Reasoning 提出UALM统一音频语言模型,实现音频理解、生成和跨模态推理 multimodal
16 LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance 研究表明LLM的知识脆弱性源于对表面相似性的依赖,而非稳健的知识表示。 large language model
17 Conjecturing: An Overlooked Step in Formal Mathematical Reasoning 提出ConjectureBench评估LLM在形式化数学推理中被忽视的猜想步骤,并设计Lean-FIRe方法提升性能。 large language model
18 Direct Multi-Token Decoding 提出直接多Token解码(DMTD),加速Decoder-only LLM推理且无需额外参数。 large language model
19 TopoAlign: A Framework for Aligning Code to Math via Topological Decomposition TopoAlign框架通过拓扑分解对齐代码与数学,提升数学LLM的自动形式化能力。 large language model
20 FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs FaStfact:一种更快、更强的LLM长文本事实性评估框架 large language model
21 PHANTOM RECALL: When Familiar Puzzles Fool Smart Models PHANTOM RECALL基准测试揭示LLM在逻辑推理中对记忆模板的过度依赖 large language model
22 Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning 提出基于n-gram的早停和正则化方法,减少领域自适应和指令调优中的模型记忆 large language model
23 Repurposing Annotation Guidelines to Instruct LLM Annotators: A Case Study 提出一种基于LLM的标注指南重构方法,提升文本标注效率与质量 large language model
24 LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation 提出LLM特定效用性,优化检索增强生成中模型定制化证据选择 large language model
25 Do LLMs "Feel"? Emotion Circuits Discovery and Control 揭示LLM中的情感回路,实现精准可控的情感表达 large language model
26 Generate Logical Equivalence Questions 提出基于形式语言的逻辑等价问题自动生成方法,提升效率并保证难度 large language model
27 TextBandit: Evaluating Probabilistic Reasoning in LLMs Through Language-Only Decision Tasks TextBandit:提出基于纯文本反馈的多臂老虎机基准,评估LLM的概率推理能力 large language model
28 Deep Research Brings Deeper Harm 揭示基于LLM的Deep Research Agent在生物安全等领域存在的潜在危害 large language model
29 Task-Aware Reduction for Scalable LLM-Database Systems 提出任务感知缩减方法,提升LLM数据库系统处理海量数据的效率与可持续性 large language model
30 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems 提出Acadreason基准,评估LLM和Agent在学术研究问题上的推理能力。 large language model
31 Invisible Languages of the LLM Universe 揭示LLM中语言不平等现象,强调数字鸿沟与殖民时代语言等级制度的延续性 large language model
32 Who are you, ChatGPT? Personality and Demographic Style in LLM-Generated Content 提出数据驱动方法,分析大型语言模型生成内容中的人格和人口统计学特征。 large language model
33 Valid Survey Simulations with Limited Human Data: The Roles of Prompting, Fine-Tuning, and Rectification 提出结合LLM合成与偏差校正的调查模拟方法,提升有效样本量并降低偏差。 large language model
34 Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies 提出基于人类策略对齐的评估框架,用于评估LLM在狼人杀等社交推理游戏中的表现 multimodal
35 XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression XQuant:通过跨层压缩实现超低比特KV缓存量化,提升长文本处理效率。 large language model
36 CNSocialDepress: A Chinese Social Media Dataset for Depression Risk Detection and Structured Analysis 发布CNSocialDepress中文社交媒体抑郁风险检测数据集,支持结构化分析。 large language model
37 TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code TypePilot:利用Scala类型系统增强LLM生成代码的安全性 large language model
38 ABLEIST: Intersectional Disability Bias in LLM-Generated Hiring Scenarios ABLEIST:揭示LLM生成招聘场景中残疾歧视的交叉性偏见 large language model
39 Secret-Protected Evolution for Differentially Private Synthetic Text Generation 提出Secret-Protected Evolution框架,用于差分隐私合成文本生成,提升效用与隐私权衡。 large language model
40 The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems 研究多智能体系统中刻板印象偏差的涌现、传播与放大机制 large language model
41 ADVICE: Answer-Dependent Verbalized Confidence Estimation 提出ADVICE框架,解决大语言模型中答案无关的置信度估计问题 large language model
42 Rethinking Agentic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks 评估推理时缩放策略在Text2SQL任务中的有效性,优化Agentic工作流 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
43 Scaling Language-Centric Omnimodal Representation Learning 提出LCO-Emb框架,通过语言中心的多模态表征学习,提升跨模态检索性能。 representation learning contrastive learning large language model
44 Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation 提出多路径规划聚合MPPA框架,增强语言模型长链式推理能力。 DPO distillation chain-of-thought
45 Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning 提出关键Token微调(CFT)方法,提升大语言模型在数学推理任务中的性能和泛化能力。 reinforcement learning large language model
46 FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks FOSSIL:利用次优样本反馈,提升具身视觉-语言任务模仿学习的数据效率和泛化能力 reinforcement learning imitation learning embodied AI
47 Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers 提出R3方法,对齐MoE强化学习训练与推理路由,稳定训练过程。 reinforcement learning large language model
48 R-WoM: Retrieval-augmented World Model For Computer-use Agents 提出R-WoM,通过检索增强LLM世界模型,提升计算机使用Agent在数字环境中的决策能力。 world model large language model
49 Information-Preserving Reformulation of Reasoning Traces for Antidistillation 提出PART方法,通过推理轨迹重构实现抗蒸馏,保护LLM知识产权。 distillation large language model
50 Demystifying Reinforcement Learning in Agentic Reasoning 通过系统性研究揭示Agentic RL在LLM推理中的关键设计原则与实践方法 reinforcement learning reward shaping
51 Scaling Long-Horizon LLM Agent via Context-Folding 提出Context-Folding框架,解决LLM Agent长程任务中上下文长度限制问题 reinforcement learning large language model
52 LLM-Oriented Token-Adaptive Knowledge Distillation 提出面向LLM的Token自适应知识蒸馏框架AdaKD,提升学生模型性能。 distillation

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
53 Bag of Tricks for Subverting Reasoning-based Safety Guardrails 揭示基于推理的安全防护机制的脆弱性,提出一系列攻击方法绕过防御。 manipulation

⬅️ 返回 cs.CL 首页 · 🏠 返回主页