cs.AI（2025-09-30）

📊 共 59 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (43 🔗7) 支柱二：RL算法与架构 (RL & Architecture) (13 🔗1) 支柱一：机器人控制 (Robot Control) (2) 支柱三：空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (43 篇)

#	题目	一句话要点	标签	🔗
1	Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline	提出OmniFake数据集与UMFDet框架，统一解决社交媒体中人工与AI生成的多模态虚假信息检测问题。	multimodal chain-of-thought
2	CHAI: Command Hijacking against embodied AI	提出CHAI以解决对具身AI的命令劫持问题	embodied AI multimodal
3	Emergent evaluation hubs in a decentralizing large language model ecosystem	揭示大语言模型生态系统中评估基准的中心化趋势与影响	large language model foundation model
4	Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination	提出推理感知Prompt编排框架，用于多智能体语言模型协同推理。	large language model foundation model
5	Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI	利用具身AI，无人机实现突发状况下的自主安全着陆决策	embodied AI
6	CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search	提出CoLLM-NAS，利用协同大语言模型进行高效的知识引导神经架构搜索	large language model
7	Better with Less: Small Proprietary Models Surpass Large Language Models in Financial Transaction Understanding	小规模金融交易专属模型超越大型语言模型，提升交易理解能力。	large language model
8	BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models	BiasBusters：揭示并缓解大语言模型中工具选择的偏差问题	large language model
9	OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!	OffTopicEval：评估大语言模型在错误场景下的安全性，揭示其泛化能力不足	large language model
10	TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise	TVS Sidekick：企业部署大语言模型的挑战与实践洞见	large language model
11	STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models	提出STaR-Attack框架，揭示并利用统一多模态模型在时空叙事推理上的安全漏洞。	multimodal
12	SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From	提出SeedPrints以解决大语言模型归属验证问题	large language model
13	AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations	提出基于商业游戏模拟的LLM基准测试框架，评估其在动态管理决策中的能力	large language model
14	MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models	MEDAKA：利用大型语言模型构建生物医学知识图谱，提升药物安全与推荐。	large language model	✅
15	Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research	评估大型语言模型作为社会科学研究中合成社会代理的应用及注意事项	large language model
16	DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models	DeepJSONEval：提出用于评估LLM在复杂嵌套JSON数据挖掘能力的新基准	large language model	✅
17	Galton's Law of Mediocrity: Why Large Language Models Regress to the Mean and Fail at Creativity in Advertising	揭示大语言模型在广告创意中趋于平庸的“高尔顿定律”现象	large language model
18	SOCK: A Benchmark for Measuring Self-Replication in Large Language Models	SOCK：用于评估大型语言模型自我复制能力的标准基准	large language model
19	90% Faster, 100% Code-Free: MLLM-Driven Zero-Code 3D Game Development	UniGen：基于MLLM的零代码3D游戏开发框架，开发速度提升90%。	large language model multimodal	✅
20	SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents	提出SafeMindBench与SafeMindAgent，评估并缓解具身LLM智能体的安全风险。	large language model multimodal
21	LLM-based Multi-Agent Blackboard System for Information Discovery in Data Science	提出基于LLM的多智能体黑板系统，解决数据科学中信息发现难题。	large language model
22	AgentFlux: Decoupled Fine-Tuning & Inference for On-Device Agentic Systems	AgentFlux：解耦微调与推理，用于端侧Agent系统，提升工具调用准确率。	large language model
23	The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain	提出Dragon Hatchling：一种受生物启发的、可解释的类Transformer语言模型	large language model
24	Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents	揭示自进化LLM Agent的Misevolution风险，提出系统性评估框架。	large language model	✅
25	Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs	Lita：轻量级Agent揭示LLM的Agentic编码能力	large language model
26	Collaborative Compression for Large-Scale MoE Deployment on Edge	提出协同压缩框架，实现超大MoE模型在边缘设备上的高效部署	large language model
27	ICL Optimized Fragility	ICL优化提升通用知识能力，但降低复杂推理的稳健性	chain-of-thought
28	Data driven approaches in nanophotonics: A review of AI-enabled metadevices	综述：AI驱动的纳米光子学，利用数据驱动方法设计超构器件	large language model
29	Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework	提出面向AI数据中心生命周期的TCO驱动框架，优化构建、刷新和运营阶段	large language model
30	Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation	提出FLoRA-NA以解决联邦低秩适应中的通信效率与准确性问题	foundation model
31	Game-Time: Evaluating Temporal Dynamics in Spoken Language Models	提出Game-Time基准，评估会话语音语言模型的时间动态性	instruction following	✅
32	Interactive Learning for LLM Reasoning	提出ILR框架，通过交互式学习提升LLM独立推理能力	large language model
33	SlimPack: Fine-Grained Asymmetric Packing for Balanced and Efficient Variable-Length LLM Training	SlimPack：面向变长LLM训练的细粒度非对称数据打包，提升平衡性和效率	large language model
34	Human-Centered Evaluation of RAG outputs: a framework and questionnaire for human-AI collaboration	提出一套以人为中心的RAG输出评估框架与问卷，提升人机协作效果	large language model
35	LLM Agents for Knowledge Discovery in Atomic Layer Processing	利用LLM Agent在原子层处理中进行知识发现	large language model
36	Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management	提出一种无偏集体记忆框架，用于高效的基于LLM的Agent 6G跨域管理	large language model	✅
37	'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs	提出“厚输出”概念，平衡大语言模型中的文化对齐实践	large language model
38	Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction	通过审计和干预视觉-语言模型，提升保释预测的公平性与准确性	large language model
39	SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs	提出SafeEvalAgent，实现LLM安全评估的自主进化与动态基准生成	large language model
40	Accelerating LLM Inference with Precomputed Query Storage	StorInfer：利用预计算查询存储加速LLM推理，尤其适用于资源受限环境	large language model
41	Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search	Chain-in-Tree：通过动态分支策略提升LLM树搜索效率	large language model	✅
42	HNote: Extending YNote with Hexadecimal Encoding for Fine-Tuning LLMs in Music Modeling	提出HNote：一种基于十六进制编码的音乐表示方法，用于微调LLM进行音乐建模	large language model
43	CustomIR: Unsupervised Fine-Tuning of Dense Embeddings for Known Document Corpora	CustomIR：利用无监督微调提升领域文档语料库的稠密嵌入效果	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (13 篇)

#	题目	一句话要点	标签	🔗
44	OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models	提出OWL模型，通过几何感知空间推理提升音频大语言模型对声音方位和距离的感知精度。	curriculum learning PULSE large language model
45	Deep Reinforcement Learning-Based Precoding for Multi-RIS-Aided Multiuser Downlink Systems with Practical Phase Shift	针对多RIS辅助多用户下行链路，提出基于DDPG的预编码方案，优化频谱效率。	reinforcement learning deep reinforcement learning DRL
46	Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs	提出Planner-R1以提升小型LLM在Agentic RL中的效率	curriculum learning reward shaping large language model
47	Scaling Homomorphic Applications in Deployment	通过部署优化提升同态加密应用的可扩展性，以电影推荐为例。	reinforcement learning OMOMO
48	R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning	提出R-Log以解决LLMs在日志分析中的能力不足问题	reinforcement learning large language model
49	RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning	提出RoRecomp，通过重组Rollout响应提升强化学习中LLM的推理效率。	reinforcement learning large language model
50	Boosting Process-Correct CoT Reasoning by Modeling Solvability of Multiple-Choice QA	通过建模多选题可解性，提升过程正确的CoT推理	reinforcement learning large language model multimodal
51	Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks	提出IRCAM-AVN，用于解决音频-视觉导航任务中信息融合与序列建模的冗余和不一致问题	reinforcement learning egocentric multimodal
52	MAGIC-MASK: Multi-Agent Guided Inter-Agent Collaboration with Mask-Based Explainability for Reinforcement Learning	MAGIC-MASK：基于掩码可解释性的多智能体强化学习协作框架	reinforcement learning deep reinforcement learning
53	Diversity-Incentivized Exploration for Versatile Reasoning	DIVER：通过多样性激励探索提升LLM的通用推理能力	reinforcement learning reward shaping large language model	✅
54	CWM: An Open-Weights LLM for Research on Code Generation with World Models	发布CWM：用于代码生成与世界模型研究的开源LLM	world model
55	Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning	提出BRIDGE算法，结合离线专家数据与在线偏好学习微调机器人策略	reinforcement learning
56	Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training	揭示推理模型后训练中涌现的注意力头及其对复杂推理的影响	reinforcement learning distillation

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
57	SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models	SafeBehavior：模拟人类多阶段推理，缓解大语言模型的越狱攻击	manipulation large language model
58	SCUBA: Salesforce Computer Use Benchmark	SCUBA：Salesforce平台计算机使用基准测试，评估CRM工作流自动化智能体	manipulation

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
59	Uncovering Zero-Shot Generalization Gaps in Time-Series Foundation Models Using Real-World Videos	提出REAL-V-TSFM数据集，揭示时序基础模型在真实视频数据上的泛化差距	optical flow foundation model

⬅️ 返回 cs.AI 首页 · 🏠 返回主页

cs.AI（2025-09-30）

🎯 兴趣领域导航

🔬 支柱九：具身大模型 (Embodied Foundation Models) (43 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (13 篇)

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册