cs.CL(2026-04-13)

📊 共 35 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (21 🔗6) 支柱二:RL算法与架构 (RL & Architecture) (12 🔗4) 支柱一:机器人控制 (Robot Control) (1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (21 篇)

#题目一句话要点标签🔗
1 Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning 提出GEVO框架,通过字形驱动微调增强多模态大语言模型对古汉字演变分析的能力 large language model multimodal
2 NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment 提出NovBench基准,用于评估大型语言模型在学术论文新颖性评估中的能力 large language model instruction following
3 General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks General365:构建通用推理基准,评估大语言模型在多样化任务中的推理能力 large language model
4 METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models METER:评估大语言模型在多层次上下文因果推理中的能力 large language model
5 AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis AOP-Smart:一种RAG增强的大语言模型框架,用于不良结局通路分析 large language model
6 Dialectic-Med: Mitigating Diagnostic Hallucinations via Counterfactual Adversarial Multi-Agent Debate Dialectic-Med:通过对抗性多智能体辩论缓解医疗诊断中的幻觉问题 large language model multimodal chain-of-thought
7 How Robust Are Large Language Models for Clinical Numeracy? An Empirical Study on Numerical Reasoning Abilities in Clinical Contexts 提出ClinicNumRobBench,评估大语言模型在临床数值推理中的鲁棒性 large language model
8 RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents RPA-Check:多阶段自动化框架,评估基于LLM的角色扮演Agent在约束环境下的性能 large language model chain-of-thought
9 A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities 通过人格引导提升LLM能力:系统分析与动态路由策略 large language model instruction following
10 A Triadic Suffix Tokenization Scheme for Numerical Reasoning 提出三元后缀分词(TST)方案,解决LLM数值推理中数字分词不一致问题。 large language model
11 C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts 提出C-ReD:一个基于真实提示的综合性中文AI生成文本检测基准。 large language model
12 METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues METRO:从专家对话记录中归纳非协作对话策略 large language model
13 Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations 揭示LLM工具调用中的结构对齐偏差,提出SABEval数据集与重平衡策略 large language model
14 Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method 提出ConflictQA基准与XoT框架,解决LLM在异构冲突知识下的推理难题 large language model
15 CocoaBench: Evaluating Unified Digital Agents in the Wild 提出 CocoaBench,用于评估统一数字智能体在复杂任务中的表现 visual grounding
16 Efficient Training for Cross-lingual Speech Language Models 提出跨语言语音语言模型CSLM,通过高效训练实现跨模态和跨语言对齐。 large language model
17 Psychological Concept Neurons: Can Neural Control Bias Probing and Shift Generation in LLMs? 通过心理概念神经元干预,研究LLM中人格特质表征与行为输出的关联性。 large language model
18 CLSGen: A Dual-Head Fine-Tuning Framework for Joint Probabilistic Classification and Verbalized Explanation CLSGen:用于联合概率分类和文本解释的双头微调框架 large language model
19 Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks 提出AggAgent,通过智能体聚合实现长程Agent任务的并行扩展 chain-of-thought
20 Hidden Failures in Robustness: Why Supervised Uncertainty Quantification Needs Better Evaluation 揭示鲁棒性中的隐藏失效:监督不确定性量化需要更好的评估方法 large language model
21 HTAA: Enhancing LLM Planning via Hybrid Toolset Agentization & Adaptation HTAA:通过混合工具集代理化与自适应增强LLM规划能力 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
22 Relax: An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale Relax:用于大规模全模态后训练的异步强化学习引擎 reinforcement learning large language model multimodal
23 BITS Pilani at SemEval-2026 Task 9: Structured Supervised Fine-Tuning with DPO Refinement for Polarization Detection 提出结合结构化监督微调与DPO优化的方法,用于提升在线极化检测的准确率和召回率。 DPO direct preference optimization large language model
24 OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models OccuBench:通过语言世界模型评估AI智能体在真实职业任务中的表现 world model world models
25 Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation Mem$^2$Evolve:通过协同进化能力扩展和经验提炼实现自我进化Agent distillation large language model
26 Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds 揭示小语言模型间共享的情感几何结构,并剖析相关方法学偏差 RLHF motion representation
27 When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies 利用LLM生成金融特征,但宏观冲击下RL交易策略表现欠佳,揭示特征有效性与策略鲁棒性间的差距 reinforcement learning PPO large language model
28 LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling LangFlow:首个媲美离散扩散的连续扩散语言模型 flow matching zero-shot transfer
29 Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind 提出基于心智理论的双重间谍防御者,用于引导信念,提升LLM对抗攻击能力。 reinforcement learning large language model
30 Utilizing and Calibrating Hindsight Process Rewards via Reinforcement with Mutual Information Self-Evaluation 提出MISE,利用后见之明自评估奖励校准,解决LLM强化学习中的稀疏奖励问题。 reinforcement learning large language model
31 Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization 提出Policy Split,通过双模态熵正则化激励LLM强化学习中的探索 reinforcement learning large language model
32 Discourse Diversity in Multi-Turn Empathic Dialogue 提出MINT框架以解决多轮同理对话中的话语多样性问题 reinforcement learning large language model
33 HiEdit: Lifelong Model Editing with Hierarchical Reinforcement Learning HiEdit:利用分层强化学习实现终身模型编辑,提升知识更新效率。 reinforcement learning

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
34 DeCoVec: Building Decoding Space based Task Vector for Large Language Models via In-Context Learning 提出DeCoVec以解决大语言模型任务向量构建问题 manipulation large language model

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
35 Evaluating Memory Capability in Continuous Lifelog Scenario 提出LifeDialBench基准测试,评估连续生活记录场景下的记忆能力 egocentric

⬅️ 返回 cs.CL 首页 · 🏠 返回主页