cs.CL(2026-01-20)

📊 共 36 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (28 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (8)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (28 篇)

#题目一句话要点标签🔗
1 Chain-of-Thought Compression Should Not Be Blind: V-Skip for Efficient Multimodal Reasoning via Dual-Path Anchoring 提出V-Skip,通过双路径锚定解决多模态CoT推理中的视觉失忆问题,实现高效压缩。 large language model multimodal chain-of-thought
2 FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs FutureOmni:首个面向多模态LLM的、评估全模态上下文未来预测能力的基准 large language model multimodal
3 Pro-AI Bias in Large Language Models 揭示大型语言模型中存在的亲AI偏见,可能影响决策。 large language model
4 RECAP: A Resource-Efficient Method for Adversarial Prompting in Large Language Models RECAP:一种资源高效的LLM对抗提示方法,通过检索复用降低计算成本 large language model
5 Domain-Adaptation through Synthetic Data: Fine-Tuning Large Language Models for German Law 利用合成数据微调大语言模型,提升其在德国法律领域的问答能力 large language model
6 Towards robust long-context understanding of large language model via active recap learning 提出主动回顾学习(ARL)框架,增强LLM对长文本的理解能力。 large language model
7 No Reliable Evidence of Self-Reported Sentience in Small Large Language Models 通过内部激活分类验证,小型LLM自述无意识 large language model
8 Large Language Models for Large-Scale, Rigorous Qualitative Analysis in Applied Health Services Research 提出人机协同框架,利用大语言模型高效严谨地进行大规模定性健康服务研究 large language model
9 BACH-V: Bridging Abstract and Concrete Human-Values in Large Language Models BACH-V:构建大语言模型中抽象与具体人类价值观的桥梁 large language model
10 Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models 提出“定位、引导、改进”框架,实现大语言模型可操作的机制可解释性 large language model
11 OpenLearnLM Benchmark: A Unified Framework for Evaluating Knowledge, Skill, and Attitude in Educational Large Language Models OpenLearnLM:用于评估教育大语言模型知识、技能和态度的统一基准 large language model
12 Activation-Space Anchored Access Control for Multi-Class Permission Reasoning in Large Language Models 提出AAAC框架,通过激活空间锚定实现大语言模型多类别权限控制 large language model
13 Top 10 Open Challenges Steering the Future of Diffusion Language Model and Its Variants 剖析扩散语言模型未来发展十大挑战,探索超越自回归范式的AI新方向 large language model multimodal
14 NewsRECON: News article REtrieval for image CONtextualization NewsRECON:提出一种新闻文章检索方法,用于图像上下文推断,解决反向图像搜索失效问题。 large language model multimodal
15 Dimension-First Evaluation of Speech-to-Speech Models with Structured Acoustic Cues 提出TRACE框架以实现高效的人类对齐语音评估 large language model chain-of-thought
16 CommunityBench: Benchmarking Community-Level Alignment across Diverse Groups and Tasks 提出 CommunityBench,用于评估 LLM 在不同群体和任务中的社区层面价值观对齐能力 large language model foundation model
17 Uncertainty-Aware Gradient Signal-to-Noise Data Selection for Instruction Tuning 提出GRADFILTERING,利用不确定性指导指令调优数据选择,提升LLM效率。 large language model
18 XCR-Bench: A Multi-Task Benchmark for Evaluating Cultural Reasoning in LLMs 提出XCR-Bench基准,用于评估大型语言模型中的文化推理能力 large language model
19 OP-Bench: Benchmarking Over-Personalization for Memory-Augmented Personalized Conversational Agents 提出OP-Bench基准测试集,用于评估记忆增强对话Agent中的过度个性化问题 large language model
20 Simulated Ignorance Fails: A Systematic Study of LLM Behaviors on Forecasting Problems Before Model Knowledge Cutoff 揭示大语言模型预测中“模拟无知”的局限性,不建议用于回顾性基准测试。 chain-of-thought
21 Fairness or Fluency? An Investigation into Language Bias of Pairwise LLM-as-a-Judge 研究发现成对LLM评判器存在显著的语言偏见,并分析了其与困惑度的关系 large language model
22 TREX: Tokenizer Regression for Optimal Data Mixture TREX:通过Tokenizer回归优化数据混合比例,提升多语言LLM分词器效率 large language model
23 When Wording Steers the Evaluation: Framing Bias in LLM judges 揭示LLM评判中的措辞偏差:提示框架影响LLM评判结果 large language model
24 Can LLM Reasoning Be Trusted? A Comparative Study: Using Human Benchmarking on Statistical Tasks 微调LLM提升统计推理能力,可用于教育和自动化评估 large language model
25 HALT: Hallucination Assessment via Latent Testing HALT:通过隐空间测试评估大语言模型的幻觉问题 large language model
26 From Quotes to Concepts: Axial Coding of Political Debates with Ensemble LMs 利用集成语言模型对政治辩论进行轴向编码,实现从引言到概念的转换 large language model
27 GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark 提出GerAV:一个用于德语作者身份验证的新基准,并利用微调LLM达到新高度 large language model
28 Beyond Known Facts: Generating Unseen Temporal Knowledge to Address Data Contamination in LLM Evaluation 提出一种基于生成未来知识的评估方法,解决LLM在时序知识图谱抽取任务中数据污染问题。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
29 Pedagogical Alignment for Vision-Language-Action Models: A Comprehensive Framework for Data, Architecture, and Evaluation in Education 提出Pedagogical VLA Framework,用于资源受限教育场景下的可解释VLA模型。 distillation vision-language-action VLA
30 Temporal-Spatial Decouple before Act: Disentangled Representation Learning for Multimodal Sentiment Analysis 提出TSDA模型,通过时空解耦表示学习提升多模态情感分析性能 representation learning spatiotemporal multimodal
31 "The Whole Is Greater Than the Sum of Its Parts": A Compatibility-Aware Multi-Teacher CoT Distillation Framework 提出COMPACT框架,通过兼容性感知的多教师CoT蒸馏提升小模型推理能力。 teacher-student distillation large language model
32 RM-Distiller: Exploiting Generative LLM for Reward Model Distillation 提出RM-Distiller,利用生成式LLM进行奖励模型蒸馏,提升对齐效果 reinforcement learning distillation large language model
33 Dr. Assistant: Enhancing Clinical Diagnostic Inquiry via Structured Diagnostic Reasoning Data and Reinforcement Learning 提出Dr. Assistant,通过结构化推理数据和强化学习增强临床诊断问询能力 reinforcement learning large language model
34 ICPO: Illocution-Calibrated Policy Optimization for Multi-Turn Conversation ICPO:针对多轮对话中指令歧义,提出语用校准策略优化方法 reinforcement learning large language model
35 Which Reasoning Trajectories Teach Students to Reason Better? A Simple Metric of Informative Alignment 提出Rank-Surprisal Ratio (RSR)指标,用于评估推理轨迹对学生模型学习的有效性。 distillation chain-of-thought
36 Knowledge Graph-Assisted LLM Post-Training for Enhanced Legal Reasoning 提出知识图谱辅助的LLM后训练方法,提升法律领域的推理能力 DPO direct preference optimization

⬅️ 返回 cs.CL 首页 · 🏠 返回主页