cs.CL（2026-03-12）

📊 共 33 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (29 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (4)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (29 篇)

#	题目	一句话要点	标签	🔗
1	MaterialFigBENCH: benchmark dataset with figures for evaluating college-level materials science problem-solving abilities of multimodal large language models	MaterialFigBENCH：用于评估多模态LLM材料科学问题解决能力的图表基准数据集	large language model multimodal
2	SciMDR: Benchmarking and Advancing Scientific Multimodal Document Reasoning	提出SciMDR框架以解决科学多模态文档推理数据集构建问题	foundation model multimodal
3	Performance Evaluation of Open-Source Large Language Models for Assisting Pathology Report Writing in Japanese	评估开源大语言模型在日语病理报告写作辅助中的性能	large language model
4	UtilityMax Prompting: A Formal Framework for Multi-Objective Large Language Model Optimization	UtilityMax Prompting：提出基于形式化语言的多目标大语言模型优化框架	large language model
5	To Words and Beyond: Probing Large Language Models for Sentence-Level Psycholinguistic Norms of Memorability and Reading Times	利用微调大语言模型预测句子级心理语言学指标：可记忆性和阅读时间	large language model
6	DatedGPT: Preventing Lookahead Bias in Large Language Models with Time-Aware Pretraining	DatedGPT：通过时间感知预训练防止大语言模型中的前瞻偏差	large language model
7	Large Language Models for Biomedical Article Classification	探索大型语言模型在生物医学文章分类中的应用，并提供实用配置建议。	large language model
8	BLooP: Zero-Shot Abstractive Summarization using Large Language Models with Bigram Lookahead Promotion	BLooP：利用大语言模型和Bigram Lookahead Promotion实现零样本摘要生成	large language model	✅
9	CoMMET: To What Extent Can LLMs Perform Theory of Mind Tasks?	提出CoMMET多模态基准，评估LLM在心理理论任务中的表现	large language model multimodal
10	BTZSC: A Benchmark for Zero-Shot Text Classification Across Cross-Encoders, Embedding Models, Rerankers and LLMs	BTZSC：零样本文本分类的综合基准，涵盖跨编码器、嵌入模型、重排序器和LLM	large language model
11	One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries	提出一种自适应工具编排框架，用于自主多模态查询处理。	multimodal
12	Sparking Scientific Creativity via LLM-Driven Interdisciplinary Inspiration	提出Idea-Catalyst框架，利用LLM激发跨学科灵感，辅助科研创新。	large language model
13	Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections	提出MADQA基准测试，评估多模态Agent在文档集合上的策略推理能力。	multimodal
14	SemBench: A Universal Semantic Framework for LLM Evaluation	SemBench：一种通用的LLM语义评估框架，自动生成跨语言评测基准。	large language model
15	QAQ: Bidirectional Semantic Coherence for Selecting High-Quality Synthetic Code Instructions	提出QAQ框架，通过双向语义一致性选择高质量合成代码指令，提升代码生成模型性能。	instruction following
16	LifeSim: Long-Horizon User Life Simulator for Personalized Assistant Evaluation	提出LifeSim，用于评估个性化助手在长期用户生活场景中的表现	large language model
17	Cross-Context Review: Improving LLM Output Quality by Separating Production and Review Sessions	提出跨上下文审查(CCR)方法，通过分离生成和审查会话提升LLM输出质量	large language model
18	CHiL(L)Grader: Calibrated Human-in-the-Loop Short-Answer Grading	提出CHiL(L)Grader，用于校准置信度的人工参与式短答案评分框架	large language model
19	PersonaTrace: Synthesizing Realistic Digital Footprints with LLM Agents	PersonaTrace：利用LLM智能体合成逼真数字足迹，解决数据稀缺问题	large language model
20	Legal-DC: Benchmarking Retrieval-Augmented Generation for Legal Documents	提出Legal-DC基准和LegRAG框架，提升中文法律文档RAG性能	large language model	✅
21	Where Matters More Than What: Decoding-aligned KV Cache Compression via Position-aware Pseudo Queries	提出DapQ：通过位置感知伪查询实现解码对齐的KV缓存压缩	large language model
22	Tiny Aya: Bridging Scale and Multilingual Depth	Tiny Aya：以33.5亿参数实现高效且平衡的多语种AI模型	foundation model
23	Try, Check and Retry: A Divide-and-Conquer Framework for Boosting Long-context Tool-Calling Performance of LLMs	提出Tool-DC框架，提升LLM在长上下文工具调用中的性能	large language model
24	LLM-Assisted Causal Structure Disambiguation and Factor Extraction for Legal Judgment Prediction	提出LLM辅助的因果结构消歧和要素提取方法，用于提升法律判决预测的准确性和鲁棒性。	large language model
25	Algorithmic Consequences of Particle Filters for Sentence Processing: Amplified Garden-Paths and Digging-In Effects	粒子滤波模型揭示句子处理中的歧义放大与“深挖”效应	large language model
26	One Supervisor, Many Modalities: Adaptive Tool Orchestration for Autonomous Queries	提出一种自适应工具编排框架以优化多模态查询处理	multimodal
27	LLM BiasScope: A Real-Time Bias Analysis Platform for Comparative LLM Evaluation	LLM BiasScope：用于大规模语言模型实时偏差分析与对比评估的平台	large language model
28	CSE-UOI at SemEval-2026 Task 6: A Two-Stage Heterogeneous Ensemble with Deliberative Complexity Gating for Political Evasion Detection	提出基于异构LLM集成和审慎复杂性门控的两阶段方法，用于政治回避检测。	large language model
29	Not Just the Destination, But the Journey: Reasoning Traces Causally Shape Generalization Behaviors	研究表明推理过程而非最终答案，因果性地塑造大语言模型的泛化行为	chain-of-thought

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签
30	Bielik-Minitron-7B: Compressing Large Language Models via Structured Pruning and Knowledge Distillation for the Polish Language	Bielik-Minitron-7B：面向波兰语，通过结构化剪枝与知识蒸馏压缩大型语言模型	reinforcement learning DPO direct preference optimization
31	Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge	提出MT-RL-Judge，利用多任务强化学习提升多模态LLM评判能力	reinforcement learning large language model multimodal
32	CLASP: Defending Hybrid Large Language Models Against Hidden State Poisoning Attacks	CLASP：防御混合大语言模型免受隐藏状态投毒攻击	Mamba SSM state space model
33	IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse	提出IndexCache以加速稀疏注意力计算	distillation large language model

⬅️ 返回 cs.CL 首页 · 🏠 返回主页

cs.CL（2026-03-12）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (29 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理