| 1 |
MSEarth: A Multimodal Scientific Dataset and Benchmark for Phenomena Uncovering in Earth Science |
提出MSEarth:一个用于地球科学现象理解的多模态科学数据集与基准。 |
large language model multimodal |
|
|
| 2 |
Privacy-Preserving Chest X-ray Report Generation via Multimodal Federated Learning with ViT and GPT-2 |
提出基于ViT和GPT-2的多模态联邦学习框架,用于保护隐私的胸部X光报告生成。 |
multimodal |
|
|
| 3 |
WDMIR: Wavelet-Driven Multimodal Intent Recognition |
提出WDMIR框架,通过小波分析增强非语言信息,提升多模态意图识别精度。 |
multimodal |
|
|
| 4 |
Large Language Models Miss the Multi-Agent Mark |
批判性分析:大型语言模型在多智能体系统应用中偏离理论基础 |
large language model |
|
|
| 5 |
Complex System Diagnostics Using a Knowledge Graph-Informed and Large Language Model-Enhanced Framework |
提出基于知识图谱与大语言模型增强的复杂系统诊断框架,提升核电站等高可靠性系统诊断能力。 |
large language model |
|
|
| 6 |
Position is Power: System Prompts as a Mechanism of Bias in Large Language Models (LLMs) |
揭示LLM系统提示位置偏差:人口统计信息位置影响模型决策 |
large language model |
|
|
| 7 |
StreamLink: Large-Language-Model Driven Distributed Data Engineering System |
StreamLink:基于大语言模型的分布式数据工程系统,提升数据处理效率与用户体验。 |
large language model |
|
|
| 8 |
CoderAgent: Simulating Student Behavior for Personalized Programming Learning with Large Language Models |
提出CoderAgent,模拟学生编程行为,实现个性化编程学习 |
large language model |
|
|
| 9 |
Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients |
基于大型语言模型的实时复合诊断医疗AI在内科常见病例中表现优于医生 |
large language model |
|
|
| 10 |
MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs |
MME-Reasoning:一个用于评估多模态大语言模型逻辑推理能力的综合基准 |
large language model multimodal |
|
|
| 11 |
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations |
ChemCoTBench:通过模块化化学操作评估LLM的化学推理能力 |
large language model chain-of-thought |
|
|
| 12 |
Policy Induction: Predicting Startup Success via Explainable Memory-Augmented In-Context Learning |
提出基于可解释记忆增强上下文学习的策略归纳方法,预测初创公司成功率。 |
large language model |
|
|
| 13 |
Scientific Paper Retrieval with LLM-Guided Semantic-Based Ranking |
SemRank:利用LLM引导的语义排序进行科学论文检索 |
large language model |
|
|
| 14 |
Make Planning Research Rigorous Again! |
强调严谨性:将传统规划的经验融入大语言模型规划,避免重复错误。 |
large language model |
|
|
| 15 |
The Feasibility of Topic-Based Watermarking on Academic Peer Reviews |
提出基于主题的水印方法,用于学术同行评议中LLM生成文本的溯源。 |
large language model |
|
|
| 16 |
The Multilingual Divide and Its Impact on Global AI Safety |
揭示多语言AI能力差距,强调其对全球AI安全的影响与挑战 |
large language model |
|
|
| 17 |
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space |
通过扩展策略空间突破大型语言模型越狱攻击的性能上限 |
large language model |
✅ |
|
| 18 |
Interpreting Social Bias in LVLMs via Information Flow Analysis and Multi-Round Dialogue Evaluation |
提出信息流分析与多轮对话评估框架,用于解释LVLMs中的社会偏见。 |
multimodal |
|
|
| 19 |
Herd Behavior: Investigating Peer Influence in LLM-based Multi-Agent Systems |
研究LLM多智能体系统中群体行为,揭示同伴影响机制并实现可控协作。 |
large language model |
|
|
| 20 |
Agent-Environment Alignment via Automated Interface Generation |
提出ALIGN框架,通过自动生成接口缓解LLM Agent与环境的错位问题 |
large language model |
✅ |
|
| 21 |
AITEE -- Agentic Tutor for Electrical Engineering |
AITEE:面向电气工程的Agentic Tutor,提升个性化学习与领域知识应用 |
large language model |
|
|
| 22 |
Towards Conversational Development Environments: Using Theory-of-Mind and Multi-Agent Architectures for Requirements Refinement |
提出AlignMind,利用心智理论和多智能体架构改进软件需求精化 |
foundation model |
|
|
| 23 |
RepoMaster: Autonomous Exploration and Understanding of GitHub Repositories for Complex Task Solving |
RepoMaster:自主探索和理解GitHub仓库,解决复杂任务 |
large language model |
✅ |
|
| 24 |
Step-Wise Formal Verification for LLM-Based Mathematical Problem Solving |
提出MATH-VF框架,用于形式化验证LLM数学问题求解过程的正确性。 |
large language model |
|
|
| 25 |
Respond to Change with Constancy: Instruction-tuning with LLM for Non-I.I.D. Network Traffic Classification |
提出ETooL模型,利用LLM指令调优解决非独立同分布网络流量分类难题 |
large language model |
|
|
| 26 |
An LLM-as-Judge Metric for Bridging the Gap with Human Evaluation in SE Tasks |
提出SE-Jury,一种基于LLM集成裁判的软件工程任务评估指标,更贴近人工评估。 |
large language model |
|
|
| 27 |
Code Researcher: Deep Research Agent for Large Systems Code and Commit History |
提出Code Researcher:用于大型系统代码和提交历史的深度研究Agent |
large language model |
|
|
| 28 |
GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning |
提出GIFARC:利用人类直觉类比提升AI推理能力的合成数据集 |
large language model |
|
|
| 29 |
MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning |
提出MIRROR框架以优化工具学习中的多智能体反思机制 |
large language model |
|
|