cs.CL(2026-03-12)

📊 共 33 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (29 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (29 篇)

#题目一句话要点标签🔗
1 MaterialFigBENCH: benchmark dataset with figures for evaluating college-level materials science problem-solving abilities of multimodal large language models MaterialFigBENCH:用于评估多模态LLM材料科学问题解决能力的图表基准数据集 large language model multimodal
2 SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning 提出SciMDR框架以解决科学多模态文档推理数据集构建问题 foundation model multimodal
3 Performance Evaluation of Open-Source Large Language Models for Assisting Pathology Report Writing in Japanese 评估开源大语言模型在日语病理报告写作辅助中的性能 large language model
4 UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization UtilityMax Prompting:提出基于形式化语言的多目标大语言模型优化框架 large language model
5 To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times 利用微调大语言模型预测句子级心理语言学指标:可记忆性和阅读时间 large language model
6 DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining DatedGPT:通过时间感知预训练防止大语言模型中的前瞻偏差 large language model
7 Large Language Models for Biomedical Article Classification 探索大型语言模型在生物医学文章分类中的应用,并提供实用配置建议。 large language model
8 BLooP: Zero-Shot Abstractive Summarization using Large Language Models with Bigram Lookahead Promotion BLooP:利用大语言模型和Bigram Lookahead Promotion实现零样本摘要生成 large language model
9 CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks? 提出CoMMET多模态基准,评估LLM在心理理论任务中的表现 large language model multimodal
10 BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs BTZSC:零样本文本分类的综合基准,涵盖跨编码器、嵌入模型、重排序器和LLM large language model
11 One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries 提出一种自适应工具编排框架,用于自主多模态查询处理。 multimodal
12 Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration 提出Idea-Catalyst框架,利用LLM激发跨学科灵感,辅助科研创新。 large language model
13 Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections 提出MADQA基准测试,评估多模态Agent在文档集合上的策略推理能力。 multimodal
14 SemBench: A Universal Semantic Framework for LLM Evaluation SemBench:一种通用的LLM语义评估框架,自动生成跨语言评测基准。 large language model
15 QAQ: Bidirectional Semantic Coherence for Selecting High-Quality Synthetic Code Instructions 提出QAQ框架,通过双向语义一致性选择高质量合成代码指令,提升代码生成模型性能。 instruction following
16 LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation 提出LifeSim,用于评估个性化助手在长期用户生活场景中的表现 large language model
17 Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions 提出跨上下文审查(CCR)方法,通过分离生成和审查会话提升LLM输出质量 large language model
18 CHiL(L)Grader: Calibrated Human-in-the-Loop Short-Answer Grading 提出CHiL(L)Grader,用于校准置信度的人工参与式短答案评分框架 large language model
19 PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents PersonaTrace:利用LLM智能体合成逼真数字足迹,解决数据稀缺问题 large language model
20 Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents 提出Legal-DC基准和LegRAG框架,提升中文法律文档RAG性能 large language model
21 Where Matters More Than What: Decoding-aligned KV Cache Compression via Position-aware Pseudo Queries 提出DapQ:通过位置感知伪查询实现解码对齐的KV缓存压缩 large language model
22 Tiny Aya: Bridging Scale and Multilingual Depth Tiny Aya:以33.5亿参数实现高效且平衡的多语种AI模型 foundation model
23 Try, Check and Retry: A Divide-and-Conquer Framework for Boosting Long-context Tool-Calling Performance of LLMs 提出Tool-DC框架,提升LLM在长上下文工具调用中的性能 large language model
24 LLM-Assisted Causal Structure Disambiguation and Factor Extraction for Legal Judgment Prediction 提出LLM辅助的因果结构消歧和要素提取方法,用于提升法律判决预测的准确性和鲁棒性。 large language model
25 Algorithmic Consequences of Particle Filters for Sentence Processing: Amplified Garden-Paths and Digging-In Effects 粒子滤波模型揭示句子处理中的歧义放大与“深挖”效应 large language model
26 One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries 提出一种自适应工具编排框架以优化多模态查询处理 multimodal
27 LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation LLM BiasScope:用于大规模语言模型实时偏差分析与对比评估的平台 large language model
28 CSE-UOI at SemEval-2026 Task 6: A Two-Stage Heterogeneous Ensemble with Deliberative Complexity Gating for Political Evasion Detection 提出基于异构LLM集成和审慎复杂性门控的两阶段方法,用于政治回避检测。 large language model
29 Not Just the Destination, But the Journey: Reasoning Traces Causally Shape Generalization Behaviors 研究表明推理过程而非最终答案,因果性地塑造大语言模型的泛化行为 chain-of-thought

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
30 Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language Bielik-Minitron-7B:面向波兰语,通过结构化剪枝与知识蒸馏压缩大型语言模型 reinforcement learning DPO direct preference optimization
31 Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge 提出MT-RL-Judge,利用多任务强化学习提升多模态LLM评判能力 reinforcement learning large language model multimodal
32 CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks CLASP:防御混合大语言模型免受隐藏状态投毒攻击 Mamba SSM state space model
33 IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse 提出IndexCache以加速稀疏注意力计算 distillation large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页