| 1 |
Webscraper: Leverage Multimodal Large Language Models for Index-Content Web Scraping |
Webscraper:利用多模态大语言模型进行索引-内容型网页抓取 |
large language model multimodal |
|
|
| 2 |
Xuanwu: Evolving General Multimodal Models into an Industrial-Grade Foundation for Content Ecosystems |
Xuanwu VL-2B:面向内容生态的工业级通用多模态基础模型 |
foundation model multimodal |
|
|
| 3 |
AEC-Bench: A Multimodal Benchmark for Agentic Systems in Architecture, Engineering, and Construction |
AEC-Bench:用于建筑、工程和建造领域智能体系统的多模态基准测试 |
foundation model multimodal |
✅ |
|
| 4 |
Bethe Ansatz with a Large Language Model |
利用大型语言模型求解坐标Bethe Ansatz,发现新型可积自旋链模型。 |
large language model |
|
|
| 5 |
ScoringBench: A Benchmark for Evaluating Tabular Foundation Models with Proper Scoring Rules |
提出ScoringBench,利用Proper Scoring Rules评估表格型预训练模型,提升决策质量。 |
foundation model |
✅ |
|
| 6 |
Spontaneous Functional Differentiation in Large Language Models: A Brain-Like Intelligence Economy |
大型语言模型涌现自发功能分化,形成类脑智能经济 |
large language model |
|
|
| 7 |
KEditVis: A Visual Analytics System for Knowledge Editing of Large Language Models |
KEditVis:用于大语言模型知识编辑的可视分析系统 |
large language model |
|
|
| 8 |
Knowledge database development by large language models for countermeasures against viruses and marine toxins |
利用大型语言模型构建病毒和海洋毒素的知识库,加速医疗对策研发。 |
large language model |
|
|
| 9 |
SciVisAgentBench: A Benchmark for Evaluating Scientific Data Analysis and Visualization Agents |
提出SciVisAgentBench,用于评估科学数据分析与可视化Agent的基准。 |
large language model multimodal |
✅ |
|
| 10 |
ATP-Bench: Towards Agentic Tool Planning for MLLM Interleaved Generation |
提出ATP-Bench基准测试,用于评估MLLM在交错生成任务中的Agentic Tool Planning能力。 |
large language model multimodal |
✅ |
|
| 11 |
Software Vulnerability Detection Using a Lightweight Graph Neural Network |
提出VulGNN,一种轻量级图神经网络,用于高效软件漏洞检测。 |
large language model |
|
|
| 12 |
Architecting Secure AI Agents: Perspectives on System-Level Defenses Against Indirect Prompt Injection Attacks |
提出面向AI Agent的系统级防御架构,应对间接Prompt注入攻击 |
large language model |
|
|
| 13 |
SISA: A Scale-In Systolic Array for GEMM Acceleration |
SISA:一种用于GEMM加速的可伸缩片上系统阵列 |
large language model |
|
|
| 14 |
Owl-AuraID 1.0: An Intelligent System for Autonomous Scientific Instrumentation and Scientific Data Analysis |
Owl-AuraID:基于GUI原生操作的自主科学仪器智能系统 |
multimodal |
✅ |
|
| 15 |
BotVerse: Real-Time Event-Driven Simulation of Social Agents |
BotVerse:基于LLM智能体,用于实时事件驱动的社交模拟框架 |
multimodal |
|
|
| 16 |
Measuring the metacognition of AI |
提出使用 meta-d' 框架和信号检测理论评估AI的元认知能力,提升AI决策可靠性。 |
large language model |
|
|
| 17 |
Beyond the Steeper Curve: AI-Mediated Metacognitive Decoupling and the Limits of the Dunning-Kruger Metaphor |
AI介导的元认知解耦:超越邓宁-克鲁格效应,揭示LLM使用的认知影响 |
large language model |
|
|
| 18 |
View-oriented Conversation Compiler for Agent Trace Analysis |
提出VCC,通过编译Agent对话日志生成结构化视图,提升上下文学习效果。 |
chain-of-thought |
|
|
| 19 |
An Empirical Study of Multi-Agent Collaboration for Automated Research |
针对自动化研究,对比多智能体协作框架的性能与稳定性 |
large language model |
|
|
| 20 |
ELT-Bench-Verified: Benchmark Quality Issues Underestimate AI Agent Capabilities |
ELT-Bench-Verified:揭示基准质量问题低估AI Agent能力,并提出改进方案 |
large language model |
|
|
| 21 |
Sima AIunty: Caste Audit in LLM-Driven Matchmaking |
Sima AIunty:LLM婚恋匹配中基于种姓的偏见审计 |
large language model |
|
|
| 22 |
Route-Induced Density and Stability (RIDE): Controlled Intervention and Mechanism Analysis of Routing-Style Meta Prompts on LLM Internal States |
RIDE:通过路由式元提示干预和分析LLM内部状态,揭示密度与稳定性的关系 |
large language model |
|
|
| 23 |
SimMOF: AI agent for Automated MOF Simulations |
SimMOF:基于LLM的多智能体框架,自动化金属有机框架模拟流程 |
large language model |
|
|
| 24 |
REFINE: Real-world Exploration of Interactive Feedback and Student Behaviour |
REFINE:探索交互式反馈与学生行为的真实世界交互式反馈系统 |
large language model |
|
|
| 25 |
GISTBench: Evaluating LLM User Understanding via Evidence-Based Interest Verification |
GISTBench:提出基于证据的用户兴趣验证基准,评估LLM在推荐系统中的用户理解能力 |
large language model |
|
|
| 26 |
WybeCoder: Verified Imperative Code Generation |
提出WybeCoder,实现指令式代码生成过程中的同步验证,提升代码质量。 |
large language model |
|
|