cs.AI（2025-10-05）

📊 共 7 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

#	题目	一句话要点	标签	🔗
1	MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models	MacroBench：一个基于大语言模型的Web自动化脚本测试平台	large language model	✅
2	Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation	提出基于贝叶斯框架的大语言模型评估方法，提升评估稳定性和可靠性。	large language model	✅
3	AlphaApollo: Orchestrating Foundation Models and Professional Tools into a Self-Evolving System for Deep Agentic Reasoning	AlphaApollo：通过自进化系统编排基础模型与专业工具，实现深度Agent推理	foundation model	✅
4	What Shapes a Creative Machine Mind? Comprehensively Benchmarking Creativity in Foundation Models	提出C^2-Eval，全面评估基础模型在收敛和发散创造力上的表现	foundation model
5	LLM-Based Data Science Agents: A Survey of Capabilities, Challenges, and Future Directions	综述：基于LLM的数据科学Agent能力、挑战与未来方向	large language model multimodal

#	题目	一句话要点	标签	🔗	⭐
6	Internal World Models as Imagination Networks in Cognitive Agents	提出心理网络分析方法以比较人类与大型语言模型的内部世界模型	world model large language model
7	Selective Expert Guidance for Effective and Diverse Exploration in Reinforcement Learning of LLMs	MENTOR：选择性专家指导提升LLM强化学习探索的有效性和多样性	reinforcement learning large language model