cs.CL(2025-07-08)

📊 共 29 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (23 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗3)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (23 篇)

#题目一句话要点标签🔗
1 Curved Inference: Concern-Sensitive Geometry in Large Language Model Residual Streams 提出曲线推理框架以解决大语言模型的几何可解释性问题 large language model
2 A Survey on Latent Reasoning 综述潜在推理:探索大型语言模型在隐空间进行多步推理的新范式。 large language model multimodal chain-of-thought
3 UQLM: A Python Package for Uncertainty Quantification in Large Language Models UQLM:一个基于不确定性量化的大语言模型幻觉检测Python工具包 large language model
4 Coding Triangle: How Does Large Language Model Understand Code? 提出Code Triangle框架,系统评估大语言模型在代码理解与生成中的能力。 large language model
5 Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors 提出DAEDCMD,解决持续多模态虚假信息检测中的知识遗忘与环境演变问题 multimodal
6 Unveiling Effective In-Context Configurations for Image Captioning: An External & Internal Analysis 针对图像描述任务,提出多模态上下文学习的外部与内部分析方法,揭示有效配置策略。 large language model multimodal
7 HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation 提出HIRAG:一种层级思维指令调优的检索增强生成方法,提升模型开放式问答能力。 large language model chain-of-thought
8 Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders 利用稀疏自编码器提升LLM可解释性与下游任务性能 large language model
9 Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling REFORM:通过奖励引导的对抗性失败模式发现,提升奖励模型的鲁棒性 large language model
10 Humans overrely on overconfident language models, across languages 研究表明,多语言环境下人类过度依赖语言模型,且易受其过度自信表达的影响 large language model
11 Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers 提出基于FLOPs的LLM重排序器效率评估指标RPP和QPP,解决现有评估方法硬件依赖问题。 large language model
12 Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs 提出熵-记忆定律,评估LLM中数据记忆难度并实现数据集推断 large language model
13 DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations 提出一种基于全合成示例的上下文学习方法,用于文档级信息抽取。 large language model
14 RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages RabakBench:构建面向低资源语言的、可扩展的多语种安全基准 large language model
15 OpenFActScore: Open-Source Atomic Evaluation of Factuality in Text Generation 提出OpenFActScore,用于开源评估文本生成的事实性 large language model
16 Few-shot text-based emotion detection 利用大语言模型和少样本学习进行文本情感检测,并在Emakhuwa语料上取得最佳效果 large language model
17 AI-Reporter: A Path to a New Genre of Scientific Communication AI-Reporter:将学术报告快速转化为可发表的科学论文 large language model
18 Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators 提出虚拟受访者框架以解决心理测量问卷项目验证问题 large language model
19 Bridging Perception and Language: A Systematic Benchmark for LVLMs' Understanding of Amodal Completion Reports 构建LVLM知觉能力评测基准,分析模型在残缺信息补全理解上的能力差异 multimodal
20 Flippi: End To End GenAI Assistant for E-Commerce Flippi:面向电商的端到端生成式AI助手,提升用户购物体验 large language model
21 DocTalk: Scalable Graph-based Dialogue Synthesis for Enhancing LLM Conversational Capabilities DocTalk:提出基于图的可扩展对话合成方法,增强LLM的对话能力 large language model
22 DRAGON: Dynamic RAG Benchmark On News DRAGON:提出首个俄语动态RAG基准,用于评估新闻领域检索增强生成系统。 large language model
23 Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs Smoothie-Qwen:通过后处理平滑技术减少多语言LLM中的语言偏见 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
24 Perception-Aware Policy Optimization for Multimodal Reasoning 提出PAPO算法,通过感知驱动的策略优化提升多模态推理能力 reinforcement learning large language model multimodal
25 Skywork-R1V3 Technical Report Skywork-R1V3:通过强化学习将文本LLM推理能力迁移至视觉语言模型 curriculum learning large language model multimodal
26 "Amazing, They All Lean Left" -- Analyzing the Political Temperaments of Current LLMs 分析主流LLM的政治倾向:揭示其普遍存在的自由主义倾向及其成因 reinforcement learning RLHF large language model
27 A Systematic Analysis of Hybrid Linear Attention 系统分析混合线性注意力机制,提升长序列建模的效率与召回率。 linear attention
28 CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization 提出CriticLean:一种评论家引导的强化学习框架,用于数学形式化。 reinforcement learning
29 Agentic-R1: Distilled Dual-Strategy Reasoning Agentic-R1:通过双策略蒸馏提升复杂推理任务的性能与效率 distillation chain-of-thought

⬅️ 返回 cs.CL 首页 · 🏠 返回主页