cs.AI(2025-09-27)

📊 共 32 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (27 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (27 篇)

#题目一句话要点标签🔗
1 ABC-Eval: Benchmarking Large Language Models on Symbolic Music Understanding and Instruction Following 提出ABC-Eval基准,评估大语言模型在符号音乐理解和指令跟随方面的能力 large language model instruction following
2 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned 提出混合数据合成框架和感知聚焦监督,提升视觉语言模型多模态推理能力。 large language model multimodal visual grounding
3 AudioRole: An Audio Dataset for Character Role-Playing in Large Language Models 提出AudioRole数据集,提升大语言模型在角色扮演中的音频个性化能力 large language model multimodal
4 Transferring Vision-Language-Action Models to Industry Applications: Architectures, Performance, and Challenges 评估视觉-语言-动作模型在工业应用中的性能与挑战,并分析其部署可行性 vision-language-action VLA
5 Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark 提出EAPrivacy基准,评估具身智能体在物理世界中的隐私意识 large language model
6 Fact Grounded Attention: Eliminating Hallucination in Large Language Models Through Attention Level Knowledge Integration 提出Fact Grounded Attention,通过知识注入注意力机制消除大语言模型的事实幻觉。 large language model
7 Artificial Phantasia: Evidence for Propositional Reasoning-Based Mental Imagery in Large Language Models 提出基于命题推理的心智意象任务,评估大语言模型复杂认知能力 large language model
8 CATMark: A Context-Aware Thresholding Framework for Robust Cross-Task Watermarking in Large Language Models 提出CATMark,一种上下文感知阈值框架,用于大语言模型中鲁棒的跨任务水印嵌入。 large language model
9 Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification DafnyCOMP:用于评测大语言模型在组合式形式化验证中性能的基准 large language model
10 Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia 提出Mini-Mafia基准测试LLM的社会智能,评估欺骗、检测和信息披露能力 large language model
11 GUI-PRA: Process Reward Agent for GUI Tasks 提出GUI-PRA,通过动态记忆和UI感知提升GUI任务中进程奖励模型的性能 large language model multimodal
12 Agentic AI Reasoning for Mobile Edge General Intelligence: Fundamentals, Approaches, and Directions 提出面向移动边缘通用智能的Agentic AI推理框架,优化资源效率与推理质量。 large language model chain-of-thought
13 VeriGRAG: Enhancing LLM-Based Verilog Code Generation with Structure-Aware Soft Prompts VeriGRAG:利用结构感知软提示增强LLM的Verilog代码生成 large language model multimodal
14 Your Dense Retriever is Secretly an Expeditious Reasoner 提出AdaQR,自适应混合查询重写框架,提升推理检索效率。 large language model
15 PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation PARROT:用于评估LLM跨系统SQL转换能力的基准测试 large language model
16 Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction 通过多Token预测增强语言模型在复杂规划中的推理能力 large language model
17 MathBode: Measuring the Stability of LLM Reasoning using Frequency Response MathBode:利用频率响应测量LLM数学推理的稳定性 large language model
18 ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search 提出ReliabilityRAG,利用文档可靠性信息增强RAG在Web搜索中的鲁棒性,防御检索语料库攻击。 large language model
19 Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores 提出基于模型一致性的LLM Elo评分代理,无需人工评估且高效 large language model
20 GeoBS: Information-Theoretic Quantification of Geographic Bias in AI Models 提出GeoBS框架,通过信息论量化AI模型中的地理偏差,并考虑空间因素。 foundation model
21 NeuroBridge: Using Generative AI to Bridge Cross-neurotype Communication Differences through Neurotypical Perspective-taking NeuroBridge:利用生成式AI和神经典型视角弥合跨神经类型沟通差异 large language model
22 Scaling LLM Test-Time Compute with Mobile NPU on Smartphones 提出面向移动NPU的LLM测试时并行扩展方法,提升小模型性能。 large language model
23 p-less Sampling: A Robust Hyperparameter-Free Approach for LLM Decoding 提出p-less采样方法,一种无需超参数的鲁棒LLM解码策略,提升生成质量。 large language model
24 AutoEP: LLMs-Driven Automation of Hyperparameter Evolution for Metaheuristic Algorithms AutoEP:利用LLM驱动的超参数进化自动优化元启发式算法 large language model
25 BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software BuildBench:基准测试LLM Agent在编译真实世界开源软件上的能力 large language model
26 Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents Kimi-Dev:基于无Agent训练的技能先验提升软件工程Agent性能 large language model
27 LLM Watermark Evasion via Bias Inversion 提出Bias-Inversion Rewriting Attack,实现LLM水印的有效规避 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
28 Toward Effective Tool-Integrated Reasoning via Self-Evolved Preference Learning 提出Tool-Light框架,通过自进化偏好学习提升LLM工具集成推理的效率与准确性 preference learning DPO direct preference optimization
29 Multiplayer Nash Preference Optimization 提出多玩家纳什偏好优化(MNPO),提升LLM在复杂偏好下的对齐效果。 reinforcement learning RLHF large language model
30 Mapping Overlaps in Benchmarks through Perplexity in the Wild 通过困惑度分析基准测试集的重叠度,揭示LLM能力间的关联 world model large language model instruction following
31 Risk Profiling and Modulation for LLMs 提出LLM风险画像与调控框架,揭示不同训练阶段模型的风险偏好差异 RLHF large language model
32 Coordination Requires Simplification: Thermodynamic Bounds on Multi-Objective Compromise in Natural and Artificial Intelligence 提出热力学协调理论以解决多目标协调问题 reinforcement learning large language model

⬅️ 返回 cs.AI 首页 · 🏠 返回主页