cs.CL(2025-10-17)

📊 共 35 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (29 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (6)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (29 篇)

#题目一句话要点标签🔗
1 KITE: A Benchmark for Evaluating Korean Instruction-Following Abilities in Large Language Models KITE:用于评估大型语言模型韩语指令遵循能力的基准 large language model instruction following
2 Evaluating Prompting Strategies and Large Language Models in Systematic Literature Review Screening: Relevance and Task-Stage Classification 系统性文献综述筛选自动化:评估提示策略与大语言模型交互作用 large language model chain-of-thought
3 Leveraging LLMs for Context-Aware Implicit Textual and Multimodal Hate Speech Detection 利用大型语言模型增强上下文感知的隐式文本和多模态仇恨言论检测 large language model multimodal
4 Outraged AI: Large language models prioritise emotion over cost in fairness enforcement 大型语言模型在公平执行中情感优先于成本,揭示类人道德决策机制 large language model foundation model
5 Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding 综述多模态检索增强生成在文档理解中的应用,弥补现有方法在结构细节和上下文建模上的不足。 large language model multimodal
6 EgMM-Corpus: A Multimodal Vision-Language Dataset for Egyptian Culture 提出EgMM-Corpus:一个用于埃及文化理解的多模态视觉-语言数据集。 multimodal
7 SpeechLLMs for Large-scale Contextualized Zero-shot Slot Filling 利用语音大语言模型进行大规模上下文零样本槽填充 large language model foundation model instruction following
8 Contextual Augmentation for Entity Linking using Large Language Models 提出基于大语言模型上下文增强的实体链接方法,提升领域外数据集性能。 large language model
9 Leveraging Test Driven Development with Large Language Models for Reliable and Verifiable Spreadsheet Code Generation: A Research Framework 提出基于测试驱动开发(TDD)的LLM代码生成框架,提升电子表格代码的可靠性与可验证性 large language model
10 Controllable Abstraction in Summary Generation for Large Language Models via Prompt Engineering 提出一种基于提示工程的可控抽象摘要生成方法,提升大语言模型摘要质量与可控性。 large language model
11 Capabilities and Evaluation Biases of Large Language Models in Classical Chinese Poetry Generation: A Case Study on Tang Poetry 提出三步评估框架,揭示大语言模型在古诗生成与评估中的偏差 large language model
12 When Seeing Is not Enough: Revealing the Limits of Active Reasoning in MLLMs 提出GuessBench基准,揭示MLLM在主动推理中存在的局限性 large language model multimodal
13 Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs CorrectBench:评估大语言模型自纠错能力的综合基准 large language model chain-of-thought
14 In Generative AI We (Dis)Trust? Computational Analysis of Trust and Distrust in Reddit Discussions 提出基于Reddit数据的计算框架,分析公众对生成式AI的信任与不信任。 large language model
15 PolySkill: Learning Generalizable Skills Through Polymorphic Abstraction PolySkill:通过多态抽象学习可泛化技能,提升Agent在开放Web环境中的持续学习能力。 large language model
16 Paper2Web: Let's Make Your Paper Alive! Paper2Web:提出学术网页自动生成框架PWAgent,提升论文传播效果 large language model
17 Emergence of Linear Truth Encodings in Language Models 提出透明Transformer玩具模型,揭示语言模型中线性真值编码涌现机制 large language model
18 LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation 提出基于博弈论的LLM互评估框架,实现更符合人类认知的模型评估 large language model
19 GraphMind: Interactive Novelty Assessment System for Accelerating Scientific Discovery GraphMind:交互式新颖性评估系统加速科学发现 large language model
20 Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation ParallaxRAG:通过多视角知识图谱检索增强生成解决多跳推理问题 large language model
21 Rethinking Cross-lingual Gaps from a Statistical Viewpoint 从统计视角重新审视跨语言差距,并提出方差控制方法 large language model
22 TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs TokenTiming:一种通用推测解码模型对的动态对齐方法 large language model
23 LLM Latent Reasoning as Chain of Superposition 提出Latent-SFT框架,通过隐式推理链实现高效且高性能的数学问题求解。 chain-of-thought
24 From Characters to Tokens: Dynamic Grouping with Hierarchical BPE 提出基于分层BPE的动态分组方法,提升语言模型效率和灵活性。 large language model
25 Temporal Referential Consistency: Do LLMs Favor Sequences Over Absolute Time References? 提出TEMP-ReCon以解决LLMs时间引用一致性问题 large language model
26 DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios DeceptionBench:一个用于评估现实场景中AI欺骗行为的综合基准 large language model
27 CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs 提出CORE框架以减少移动代理中的UI暴露问题 large language model
28 VocalBench-DF: A Benchmark for Evaluating Speech LLM Robustness to Disfluency VocalBench-DF:评估语音LLM对口语不流畅鲁棒性的基准测试 large language model
29 When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling 提出SAFE框架,通过选择性集成提升长文本生成中LLM集成的效率与稳定性。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
30 MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval 提出模态组合感知框架MCA,提升组合多模态检索的鲁棒性 contrastive learning large language model multimodal
31 Fine-Tuning MedGemma for Clinical Captioning to Enhance Multimodal RAG over Malaysia CPGs 微调MedGemma用于临床图像描述,增强马来西亚CPG上的多模态RAG distillation multimodal
32 InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training InfiMed-ORBIT:通过基于规则的增量训练对齐LLM,解决开放式复杂医疗任务 reinforcement learning large language model instruction following
33 AutoGraph-R1: End-to-End Reinforcement Learning for Knowledge Graph Construction 提出AutoGraph-R1以优化知识图谱构建提升问答系统性能 reinforcement learning policy learning
34 POPI: Personalizing LLMs via Optimized Preference Inference POPI:通过优化偏好推断实现LLM的个性化定制 reinforcement learning large language model
35 Infinity Parser: Layout Aware Reinforcement Learning for Scanned Document Parsing 提出LayoutRL框架和Infinity-Parser模型,解决扫描文档解析的泛化性问题。 reinforcement learning

⬅️ 返回 cs.CL 首页 · 🏠 返回主页