cs.AI(2025-09-30)

📊 共 59 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (43 🔗7) 支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (43 篇)

#题目一句话要点标签🔗
1 Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline 提出OmniFake数据集与UMFDet框架,统一解决社交媒体中人工与AI生成的多模态虚假信息检测问题。 multimodal chain-of-thought
2 CHAI: Command Hijacking against embodied AI 提出CHAI以解决对具身AI的命令劫持问题 embodied AI multimodal
3 Emergent evaluation hubs in a decentralizing large language model ecosystem 揭示大语言模型生态系统中评估基准的中心化趋势与影响 large language model foundation model
4 Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination 提出推理感知Prompt编排框架,用于多智能体语言模型协同推理。 large language model foundation model
5 Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI 利用具身AI,无人机实现突发状况下的自主安全着陆决策 embodied AI
6 CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search 提出CoLLM-NAS,利用协同大语言模型进行高效的知识引导神经架构搜索 large language model
7 Better with Less: Small Proprietary Models Surpass Large Language Models in Financial Transaction Understanding 小规模金融交易专属模型超越大型语言模型,提升交易理解能力。 large language model
8 BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models BiasBusters:揭示并缓解大语言模型中工具选择的偏差问题 large language model
9 OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! OffTopicEval:评估大语言模型在错误场景下的安全性,揭示其泛化能力不足 large language model
10 TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise TVS Sidekick:企业部署大语言模型的挑战与实践洞见 large language model
11 STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models 提出STaR-Attack框架,揭示并利用统一多模态模型在时空叙事推理上的安全漏洞。 multimodal
12 SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From 提出SeedPrints以解决大语言模型归属验证问题 large language model
13 AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations 提出基于商业游戏模拟的LLM基准测试框架,评估其在动态管理决策中的能力 large language model
14 MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models MEDAKA:利用大型语言模型构建生物医学知识图谱,提升药物安全与推荐。 large language model
15 Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research 评估大型语言模型作为社会科学研究中合成社会代理的应用及注意事项 large language model
16 DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models DeepJSONEval:提出用于评估LLM在复杂嵌套JSON数据挖掘能力的新基准 large language model
17 Galton's Law of Mediocrity: Why Large Language Models Regress to the Mean and Fail at Creativity in Advertising 揭示大语言模型在广告创意中趋于平庸的“高尔顿定律”现象 large language model
18 SOCK: A Benchmark for Measuring Self-Replication in Large Language Models SOCK:用于评估大型语言模型自我复制能力的标准基准 large language model
19 90% Faster, 100% Code-Free: MLLM-Driven Zero-Code 3D Game Development UniGen:基于MLLM的零代码3D游戏开发框架,开发速度提升90%。 large language model multimodal
20 SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents 提出SafeMindBench与SafeMindAgent,评估并缓解具身LLM智能体的安全风险。 large language model multimodal
21 LLM-based Multi-Agent Blackboard System for Information Discovery in Data Science 提出基于LLM的多智能体黑板系统,解决数据科学中信息发现难题。 large language model
22 AgentFlux: Decoupled Fine-Tuning & Inference for On-Device Agentic Systems AgentFlux:解耦微调与推理,用于端侧Agent系统,提升工具调用准确率。 large language model
23 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain 提出Dragon Hatchling:一种受生物启发的、可解释的类Transformer语言模型 large language model
24 Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents 揭示自进化LLM Agent的Misevolution风险,提出系统性评估框架。 large language model
25 Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs Lita:轻量级Agent揭示LLM的Agentic编码能力 large language model
26 Collaborative Compression for Large-Scale MoE Deployment on Edge 提出协同压缩框架,实现超大MoE模型在边缘设备上的高效部署 large language model
27 ICL Optimized Fragility ICL优化提升通用知识能力,但降低复杂推理的稳健性 chain-of-thought
28 Data driven approaches in nanophotonics: A review of AI-enabled metadevices 综述:AI驱动的纳米光子学,利用数据驱动方法设计超构器件 large language model
29 Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework 提出面向AI数据中心生命周期的TCO驱动框架,优化构建、刷新和运营阶段 large language model
30 Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation 提出FLoRA-NA以解决联邦低秩适应中的通信效率与准确性问题 foundation model
31 Game-Time: Evaluating Temporal Dynamics in Spoken Language Models 提出Game-Time基准,评估会话语音语言模型的时间动态性 instruction following
32 Interactive Learning for LLM Reasoning 提出ILR框架,通过交互式学习提升LLM独立推理能力 large language model
33 SlimPack: Fine-Grained Asymmetric Packing for Balanced and Efficient Variable-Length LLM Training SlimPack:面向变长LLM训练的细粒度非对称数据打包,提升平衡性和效率 large language model
34 Human-Centered Evaluation of RAG outputs: a framework and questionnaire for human-AI collaboration 提出一套以人为中心的RAG输出评估框架与问卷,提升人机协作效果 large language model
35 LLM Agents for Knowledge Discovery in Atomic Layer Processing 利用LLM Agent在原子层处理中进行知识发现 large language model
36 Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management 提出一种无偏集体记忆框架,用于高效的基于LLM的Agent 6G跨域管理 large language model
37 'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs 提出“厚输出”概念,平衡大语言模型中的文化对齐实践 large language model
38 Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction 通过审计和干预视觉-语言模型,提升保释预测的公平性与准确性 large language model
39 SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs 提出SafeEvalAgent,实现LLM安全评估的自主进化与动态基准生成 large language model
40 Accelerating LLM Inference with Precomputed Query Storage StorInfer:利用预计算查询存储加速LLM推理,尤其适用于资源受限环境 large language model
41 Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search Chain-in-Tree:通过动态分支策略提升LLM树搜索效率 large language model
42 HNote: Extending YNote with Hexadecimal Encoding for Fine-Tuning LLMs in Music Modeling 提出HNote:一种基于十六进制编码的音乐表示方法,用于微调LLM进行音乐建模 large language model
43 CustomIR: Unsupervised Fine-Tuning of Dense Embeddings for Known Document Corpora CustomIR:利用无监督微调提升领域文档语料库的稠密嵌入效果 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
44 OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models 提出OWL模型,通过几何感知空间推理提升音频大语言模型对声音方位和距离的感知精度。 curriculum learning PULSE large language model
45 Deep Reinforcement Learning-Based Precoding for Multi-RIS-Aided Multiuser Downlink Systems with Practical Phase Shift 针对多RIS辅助多用户下行链路,提出基于DDPG的预编码方案,优化频谱效率。 reinforcement learning deep reinforcement learning DRL
46 Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs 提出Planner-R1以提升小型LLM在Agentic RL中的效率 curriculum learning reward shaping large language model
47 Scaling Homomorphic Applications in Deployment 通过部署优化提升同态加密应用的可扩展性,以电影推荐为例。 reinforcement learning OMOMO
48 R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning 提出R-Log以解决LLMs在日志分析中的能力不足问题 reinforcement learning large language model
49 RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning 提出RoRecomp,通过重组Rollout响应提升强化学习中LLM的推理效率。 reinforcement learning large language model
50 Boosting Process-Correct CoT Reasoning by Modeling Solvability of Multiple-Choice QA 通过建模多选题可解性,提升过程正确的CoT推理 reinforcement learning large language model multimodal
51 Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks 提出IRCAM-AVN,用于解决音频-视觉导航任务中信息融合与序列建模的冗余和不一致问题 reinforcement learning egocentric multimodal
52 MAGIC-MASK: Multi-Agent Guided Inter-Agent Collaboration with Mask-Based Explainability for Reinforcement Learning MAGIC-MASK:基于掩码可解释性的多智能体强化学习协作框架 reinforcement learning deep reinforcement learning
53 Diversity-Incentivized Exploration for Versatile Reasoning DIVER:通过多样性激励探索提升LLM的通用推理能力 reinforcement learning reward shaping large language model
54 CWM: An Open-Weights LLM for Research on Code Generation with World Models 发布CWM:用于代码生成与世界模型研究的开源LLM world model
55 Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning 提出BRIDGE算法,结合离线专家数据与在线偏好学习微调机器人策略 reinforcement learning
56 Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training 揭示推理模型后训练中涌现的注意力头及其对复杂推理的影响 reinforcement learning distillation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
57 SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models SafeBehavior:模拟人类多阶段推理,缓解大语言模型的越狱攻击 manipulation large language model
58 SCUBA: Salesforce Computer Use Benchmark SCUBA:Salesforce平台计算机使用基准测试,评估CRM工作流自动化智能体 manipulation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
59 Uncovering Zero-Shot Generalization Gaps in Time-Series Foundation Models Using Real-World Videos 提出REAL-V-TSFM数据集,揭示时序基础模型在真实视频数据上的泛化差距 optical flow foundation model

⬅️ 返回 cs.AI 首页 · 🏠 返回主页