cs.CL（2025-07-08）

📊 共 29 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (23 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗3)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (23 篇)

#	题目	一句话要点	标签	🔗
1	Curved Inference: Concern-Sensitive Geometry in Large Language Model Residual Streams	提出曲线推理框架以解决大语言模型的几何可解释性问题	large language model
2	A Survey on Latent Reasoning	综述潜在推理：探索大型语言模型在隐空间进行多步推理的新范式。	large language model multimodal chain-of-thought	✅
3	UQLM: A Python Package for Uncertainty Quantification in Large Language Models	UQLM：一个基于不确定性量化的大语言模型幻觉检测Python工具包	large language model
4	Coding Triangle: How Does Large Language Model Understand Code?	提出Code Triangle框架，系统评估大语言模型在代码理解与生成中的能力。	large language model
5	Remember Past, Anticipate Future: Learning Continual Multimodal Misinformation Detectors	提出DAEDCMD，解决持续多模态虚假信息检测中的知识遗忘与环境演变问题	multimodal
6	Unveiling Effective In-Context Configurations for Image Captioning: An External & Internal Analysis	针对图像描述任务，提出多模态上下文学习的外部与内部分析方法，揭示有效配置策略。	large language model multimodal
7	HIRAG: Hierarchical-Thought Instruction-Tuning Retrieval-Augmented Generation	提出HIRAG：一种层级思维指令调优的检索增强生成方法，提升模型开放式问答能力。	large language model chain-of-thought
8	Exploring Task Performance with Interpretable Models via Sparse Auto-Encoders	利用稀疏自编码器提升LLM可解释性与下游任务性能	large language model
9	Reward Models Can Improve Themselves: Reward-Guided Adversarial Failure Mode Discovery for Robust Reward Modeling	REFORM：通过奖励引导的对抗性失败模式发现，提升奖励模型的鲁棒性	large language model
10	Humans overrely on overconfident language models, across languages	研究表明，多语言环境下人类过度依赖语言模型，且易受其过度自信表达的影响	large language model
11	Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers	提出基于FLOPs的LLM重排序器效率评估指标RPP和QPP，解决现有评估方法硬件依赖问题。	large language model	✅
12	Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs	提出熵-记忆定律，评估LLM中数据记忆难度并实现数据集推断	large language model
13	DocIE@XLLM25: In-Context Learning for Information Extraction using Fully Synthetic Demonstrations	提出一种基于全合成示例的上下文学习方法，用于文档级信息抽取。	large language model
14	RabakBench: Scaling Human Annotations to Construct Localized Multilingual Safety Benchmarks for Low-Resource Languages	RabakBench：构建面向低资源语言的、可扩展的多语种安全基准	large language model
15	OpenFActScore: Open-Source Atomic Evaluation of Factuality in Text Generation	提出OpenFActScore，用于开源评估文本生成的事实性	large language model	✅
16	Few-shot text-based emotion detection	利用大语言模型和少样本学习进行文本情感检测，并在Emakhuwa语料上取得最佳效果	large language model
17	AI-Reporter: A Path to a New Genre of Scientific Communication	AI-Reporter：将学术报告快速转化为可发表的科学论文	large language model
18	Psychometric Item Validation Using Virtual Respondents with Trait-Response Mediators	提出虚拟受访者框架以解决心理测量问卷项目验证问题	large language model
19	Bridging Perception and Language: A Systematic Benchmark for LVLMs' Understanding of Amodal Completion Reports	构建LVLM知觉能力评测基准，分析模型在残缺信息补全理解上的能力差异	multimodal
20	Flippi: End To End GenAI Assistant for E-Commerce	Flippi：面向电商的端到端生成式AI助手，提升用户购物体验	large language model
21	DocTalk: Scalable Graph-based Dialogue Synthesis for Enhancing LLM Conversational Capabilities	DocTalk：提出基于图的可扩展对话合成方法，增强LLM的对话能力	large language model	✅
22	DRAGON: Dynamic RAG Benchmark On News	DRAGON：提出首个俄语动态RAG基准，用于评估新闻领域检索增强生成系统。	large language model
23	Smoothie-Qwen: Post-Hoc Smoothing to Reduce Language Bias in Multilingual LLMs	Smoothie-Qwen：通过后处理平滑技术减少多语言LLM中的语言偏见	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗
24	Perception-Aware Policy Optimization for Multimodal Reasoning	提出PAPO算法，通过感知驱动的策略优化提升多模态推理能力	reinforcement learning large language model multimodal	✅
25	Skywork-R1V3 Technical Report	Skywork-R1V3：通过强化学习将文本LLM推理能力迁移至视觉语言模型	curriculum learning large language model multimodal
26	"Amazing, They All Lean Left" -- Analyzing the Political Temperaments of Current LLMs	分析主流LLM的政治倾向：揭示其普遍存在的自由主义倾向及其成因	reinforcement learning RLHF large language model
27	A Systematic Analysis of Hybrid Linear Attention	系统分析混合线性注意力机制，提升长序列建模的效率与召回率。	linear attention	✅
28	CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization	提出CriticLean：一种评论家引导的强化学习框架，用于数学形式化。	reinforcement learning
29	Agentic-R1: Distilled Dual-Strategy Reasoning	Agentic-R1：通过双策略蒸馏提升复杂推理任务的性能与效率	distillation chain-of-thought	✅

⬅️ 返回 cs.CL 首页 · 🏠 返回主页

cs.CL（2025-07-08）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (23 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理