cs.AI（2025-09-27）

📊 共 28 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (23 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (23 篇)

#	题目	一句话要点	标签	🔗
1	ABC-Eval: Benchmarking Large Language Models on Symbolic Music Understanding and Instruction Following	提出ABC-Eval基准，评估大语言模型在符号音乐理解和指令跟随方面的能力	large language model instruction following
2	Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned	提出混合数据合成框架和感知聚焦监督，提升视觉语言模型多模态推理能力	large language model multimodal visual grounding
3	AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models	提出AudioRole数据集，提升大语言模型在语音角色扮演中的性能	large language model multimodal
4	Transferring Vision-Language-Action Models to Industry Applications: Architectures, Performance, and Challenges	评估并改进视觉-语言-动作模型在工业场景的应用性能	vision-language-action VLA
5	Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark	提出EAPrivacy基准，评估具身智能体在物理世界中的隐私意识	large language model	✅
6	Fact Grounded Attention: Eliminating Hallucination in Large Language Models Through Attention Level Knowledge Integration	提出Fact Grounded Attention，通过知识注入注意力机制消除大语言模型的事实幻觉。	large language model
7	Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models	提出基于命题推理的心智意象任务，评估大语言模型复杂认知能力	large language model
8	CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models	提出CATMark上下文感知阈值水印框架，提升大语言模型跨任务水印的鲁棒性与文本质量。	large language model
9	GUI-PRA: Process Reward Agent for GUI Tasks	GUI-PRA：用于GUI任务的过程奖励Agent，解决长程任务中的“中间迷失”和UI状态感知问题	large language model multimodal
10	Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions	提出基于Agentic AI的移动边缘通用智能推理框架，优化资源受限环境下的LLM部署。	large language model chain-of-thought
11	VeriGRAG: Enhancing LLM-Based Verilog Code Generation with Structure-Aware Soft Prompts	VeriGRAG：利用结构感知软提示增强LLM的Verilog代码生成	large language model multimodal
12	Your Dense Retriever is Secretly an Expeditious Reasoner	提出AdaQR，自适应混合查询重写框架，提升推理检索效率。	large language model
13	PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation	PARROT：用于评估LLM跨系统SQL转换能力的基准测试	large language model	✅
14	Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction	通过多Token预测增强语言模型在复杂规划中的传递关系学习能力	large language model
15	MathBode: Measuring the Stability of LLM Reasoning using Frequency Response	MathBode：利用频率响应测量LLM推理的稳定性	large language model
16	ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search	提出ReliabilityRAG，利用文档可靠性信息增强RAG在Web搜索中的鲁棒性，防御提示注入攻击。	large language model
17	Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores	提出基于模型一致性的LLM Elo评分代理，无需人工评估且高效预测模型性能	large language model
18	GeoBS: Information-Theoretic Quantification of Geographic Bias in AI Models	提出GeoBS：一个基于信息论的地理偏见量化框架，用于评估AI模型中的地域偏差。	foundation model
19	NeuroBridge: Using Generative AI to Bridge Cross-neurotype Communication Differences through Neurotypical Perspective-taking	NeuroBridge：利用生成式AI和神经典型视角弥合跨神经类型沟通差异	large language model
20	Scaling LLM Test-Time Compute with Mobile NPU on Smartphones	提出面向移动NPU的LLM测试时并行扩展技术，提升小模型性能。	large language model
21	p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding	提出p-less采样，一种无超参数的LLM解码方法，提升生成质量和效率。	large language model	✅
22	AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms	AutoEP：利用LLM驱动的超参数进化自动优化元启发式算法	large language model
23	Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents	Kimi-Dev：基于无Agent训练的技能先验提升软件工程Agent性能	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗
24	Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning	提出Tool-Light框架，通过自进化偏好学习提升LLM工具集成推理的效率与准确性	preference learning DPO direct preference optimization
25	Multiplayer Nash Preference Optimization	提出MNPO，将NLHF扩展到多人博弈，提升复杂偏好对齐效果	reinforcement learning RLHF large language model	✅
26	Mapping Overlaps in Benchmarks through Perplexity in the Wild	通过困惑度分析，揭示大语言模型评测基准的重叠与差异	world model large language model instruction following
27	Risk Profiling and Modulation for LLMs	提出LLM风险画像与调控流程，探索后训练对风险偏好的影响	RLHF large language model
28	Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence	提出热力学协调理论，揭示多目标协调需简化信息以应对热力学约束。	reinforcement learning large language model

⬅️ 返回 cs.AI 首页 · 🏠 返回主页

cs.AI（2025-09-27）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (23 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册

👤 用户管理