cs.AI(2025-09-30)

📊 共 59 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (43 🔗7) 支柱二:RL算法与架构 (RL & Architecture) (13 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (43 篇)

#题目一句话要点标签🔗
1 Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline 提出OmniFake数据集与UMFDet框架,用于统一检测社交媒体中的多模态虚假信息。 multimodal chain-of-thought
2 CHAI: Command Hijacking against embodied AI CHAI:针对具身AI的命令劫持攻击方法 embodied AI multimodal
3 Emergent evaluation hubs in a decentralizing large language model ecosystem 揭示大语言模型生态系统中基准评估中心的涌现与集中化趋势 large language model foundation model
4 Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination 提出推理感知Prompt编排框架,用于多智能体语言模型协同推理。 large language model foundation model
5 Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI 利用具身AI,无人机实现突发情况下的自主安全着陆决策 embodied AI
6 CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search 提出CoLLM-NAS,利用协同大语言模型实现高效的知识引导神经架构搜索 large language model
7 Better with Less: Small Proprietary Models Surpass Large Language Models in Financial Transaction Understanding 小规模私有模型在金融交易理解任务上超越大型语言模型 large language model
8 BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models BiasBusters:揭示并缓解大语言模型中工具选择的偏差问题 large language model
9 OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! OffTopicEval:评估大语言模型在错误场景下的安全性,揭示其泛化能力不足 large language model
10 TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise TVS Sidekick:企业部署大型语言模型的挑战与实践洞见 large language model
11 STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models 提出STaR-Attack,针对统一多模态模型生成-理解耦合漏洞的多轮时空叙事推理攻击框架 multimodal
12 SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From 提出SeedPrints:利用模型初始化偏差进行大语言模型溯源 large language model
13 AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations 提出基于商业游戏模拟的LLM基准测试,评估其在动态管理决策中的能力。 large language model
14 MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models MEDAKA:利用大型语言模型构建生物医学知识图谱,提升药物安全与推荐。 large language model
15 Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research 评估大型语言模型作为社会科学研究中合成社会代理的应用 large language model
16 DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models DeepJSONEval:提出深度嵌套JSON数据挖掘基准,评估LLM在复杂结构化数据处理中的能力。 large language model
17 Galton's Law of Mediocrity: Why Large Language Models Regress to the Mean and Fail at Creativity in Advertising 揭示大语言模型在广告创意中趋于平庸的“高尔顿定律”现象 large language model
18 SOCK: A Benchmark for Measuring Self-Replication in Large Language Models SOCK:用于评估大型语言模型自我复制能力的标准基准 large language model
19 90% Faster, 100% Code-Free: MLLM-Driven Zero-Code 3D Game Development UniGen:基于MLLM的零代码3D游戏开发框架,提速90% large language model multimodal
20 SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents 提出SafeMindBench与SafeMindAgent,评估并缓解具身LLM智能体的安全风险。 large language model multimodal
21 LLM-Based Multi-Agent Blackboard System for Information Discovery in Data Science 提出基于LLM的多智能体黑板系统,解决数据科学中信息发现难题。 large language model
22 AgentFlux: Decoupled Fine-Tuning & Inference for On-Device Agentic Systems AgentFlux:解耦微调与推理,实现端侧Agent系统高效工具调用 large language model
23 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain 提出Dragon Hatchling模型,弥合Transformer与大脑模型之间的差距 large language model
24 Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents 揭示自进化LLM Agent的Misevolution风险,提出系统性评估框架 large language model
25 Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs Lita:轻量级Agent揭示LLM的Agentic编码能力 large language model
26 Collaborative Compression for Large-Scale MoE Deployment on Edge 提出协同压缩框架,实现超大MoE模型在边缘设备上的高效部署。 large language model
27 ICL Optimized Fragility ICL优化提升通用知识能力,但降低复杂推理的灵活性 chain-of-thought
28 Data driven approaches in nanophotonics: A review of AI-enabled metadevices 综述:AI驱动的纳米光子学超构器件设计与优化 large language model
29 Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework 提出面向AI数据中心生命周期的TCO驱动框架,优化构建、刷新和运营阶段 large language model
30 Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation 提出FLoRA-NA以解决联邦低秩适应中的通信效率问题 foundation model
31 Game-Time: Evaluating Temporal Dynamics in Spoken Language Models 提出Game-Time基准,评估会话语音语言模型的时间动态性 instruction following
32 Interactive Learning for LLM Reasoning 提出ILR框架,通过交互式学习提升LLM独立推理能力 large language model
33 SlimPack: Fine-Grained Asymmetric Packing for Balanced and Efficient Variable-Length LLM Training SlimPack:用于平衡高效变长LLM训练的细粒度非对称数据打包框架 large language model
34 Human-Centered Evaluation of RAG outputs: a framework and questionnaire for human-AI collaboration 提出人本评估框架以优化RAG系统输出 large language model
35 LLM Agents for Knowledge Discovery in Atomic Layer Processing 利用LLM Agent在原子层处理中进行知识发现 large language model
36 Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management 提出基于LLM Agent的无偏集体记忆框架,用于高效6G跨域资源管理 large language model
37 'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs 提出文化对齐方法以解决LLMs文化敏感性不足的问题 large language model
38 Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction 提出基于视觉-语言模型的保释预测审计与干预方法,提升公平性。 large language model
39 SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs 提出SafeEvalAgent以解决LLMs安全评估动态性不足问题 large language model
40 Accelerating LLM Inference with Precomputed Query Storage StorInfer:利用预计算查询存储加速LLM推理,尤其适用于资源受限环境。 large language model
41 Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search Chain-in-Tree:通过动态分支策略提升LLM树搜索效率 large language model
42 HNote: Extending YNote with Hexadecimal Encoding for Fine-Tuning LLMs in Music Modeling 提出HNote:一种基于十六进制编码的音乐表示方法,用于微调LLM以进行音乐建模 large language model
43 CustomIR: Unsupervised Fine-Tuning of Dense Embeddings for Known Document Corpora CustomIR:利用无监督微调提升领域文档稠密向量表示检索性能 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
44 OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models 提出OWL模型,通过几何感知空间推理提升音频大语言模型对声源定位的精度和可解释性。 curriculum learning PULSE large language model
45 Deep Reinforcement Learning-Based Precoding for Multi-RIS-Aided Multiuser Downlink Systems with Practical Phase Shift 提出基于深度强化学习的预编码方法以优化多用户下行系统 reinforcement learning deep reinforcement learning DRL
46 Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs 提出Planner-R1以提升小型LLM在Agentic RL中的效率 curriculum learning reward shaping large language model
47 Scaling Homomorphic Applications in Deployment 通过部署优化提升同态加密应用的可扩展性 reinforcement learning OMOMO
48 R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning R-Log:通过基于推理的强化学习,提升LLM在日志分析中的能力 reinforcement learning large language model
49 RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning 提出RoRecomp,通过重组Rollout响应提升强化学习中LLM的推理效率。 reinforcement learning large language model
50 Boosting Process-Correct CoT Reasoning by Modeling Solvability of Multiple-Choice QA 通过建模多项选择题的可解性,提升过程正确的CoT推理 reinforcement learning large language model multimodal
51 Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks 提出IRCAM-AVN,用于解决音频-视觉导航任务中信息融合与序列建模的冗余与不一致问题 reinforcement learning egocentric multimodal
52 MAGIC-MASK: Multi-Agent Guided Inter-Agent Collaboration with Mask-Based Explainability for Reinforcement Learning MAGIC-MASK:基于掩码可解释性的多智能体强化学习协作框架 reinforcement learning deep reinforcement learning
53 Diversity-Incentivized Exploration for Versatile Reasoning DIVER:通过多样性激励探索提升LLM的通用推理能力 reinforcement learning reward shaping large language model
54 CWM: An Open-Weights LLM for Research on Code Generation with World Models 发布CWM:用于世界模型代码生成研究的开源LLM world model
55 Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning 提出BRIDGE算法,结合离线专家数据与在线偏好学习微调策略,提升机器人控制效率。 reinforcement learning
56 Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training 揭示推理模型后训练中涌现的注意力头:结构化推理与计算的关键 reinforcement learning distillation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
57 SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models SafeBehavior:模拟人类多阶段推理以防御大语言模型的越狱攻击 manipulation large language model
58 SCUBA: Salesforce Computer Use Benchmark SCUBA:Salesforce平台计算机使用基准测试,评估CRM工作流自动化智能体 manipulation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
59 Uncovering Zero-Shot Generalization Gaps in Time-Series Foundation Models Using Real-World Videos 提出REAL-V-TSFM数据集,揭示时序基础模型在真实视频数据上的泛化差距 optical flow foundation model

⬅️ 返回 cs.AI 首页 · 🏠 返回主页