| 1 |
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following |
CMI-Bench:一个全面的音乐指令跟随评估基准,用于评估音频-文本大语言模型在音乐信息检索任务中的性能。 |
large language model instruction following |
|
|
| 2 |
Information Suppression in Large Language Models: Auditing, Quantifying, and Characterizing Censorship in DeepSeek |
提出审计框架,揭示DeepSeek大语言模型中的信息抑制现象 |
large language model chain-of-thought |
|
|
| 3 |
DeepSeq: High-Throughput Single-Cell RNA Sequencing Data Labeling via Web Search-Augmented Agentic Generative AI Foundation Models |
DeepSeq:利用Web搜索增强的Agentic生成式AI基础模型进行高通量单细胞RNA测序数据标记 |
foundation model |
|
|
| 4 |
A Comprehensive Survey of Deep Research: Systems, Methodologies, and Applications |
对基于AI的深度研究系统:系统、方法与应用的全面综述 |
large language model foundation model multimodal |
✅ |
|
| 5 |
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models |
提出MEraser以有效去除大语言模型的指纹 |
large language model |
|
|
| 6 |
Automated Heuristic Design for Unit Commitment Using Large Language Models |
提出基于大语言模型的FunSearch方法,用于自动设计电力系统机组组合方案 |
large language model |
|
|
| 7 |
DinoCompanion: An Attachment-Theory Informed Multimodal Robot for Emotionally Responsive Child-AI Interaction |
DinoCompanion:基于依恋理论的多模态机器人,用于情感响应式儿童-AI互动 |
multimodal |
|
|
| 8 |
MALM: A Multi-Information Adapter for Large Language Models to Mitigate Hallucination |
提出MALM多信息适配器,利用多图学习缓解大语言模型幻觉问题 |
large language model |
|
|
| 9 |
CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models |
提出CORONA框架,利用大语言模型进行图推荐的粗到细候选过滤。 |
large language model |
|
|
| 10 |
Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models |
提出 Sleek 攻击,揭示大语言模型中基于逐步推理的知识擦除漏洞 |
large language model |
|
|
| 11 |
Model Merging for Knowledge Editing |
提出基于模型融合的知识编辑框架,提升LLM在序列编辑中的性能并保持通用能力 |
large language model foundation model |
✅ |
|
| 12 |
Behavioral Generative Agents for Energy Operations |
提出基于生成式Agent的能源运营消费者行为建模方法 |
large language model |
|
|
| 13 |
Evaluating AI Alignment in Eleven LLMs through Output-Based Analysis and Human Benchmarking |
PAPERS框架:通过输出分析和人类基准评估11个LLM中的AI对齐程度 |
large language model |
|
|
| 14 |
Graph of Verification: Structured Verification of LLM Reasoning with Directed Acyclic Graphs |
提出GoV框架,通过有向无环图结构化验证LLM推理过程,提升验证的适应性和精度。 |
large language model |
|
|
| 15 |
SheetMind: An End-to-End LLM-Powered Multi-Agent Framework for Spreadsheet Automation |
SheetMind:基于LLM的多智能体电子表格自动化框架 |
large language model |
|
|
| 16 |
The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries |
首个LLM库缺陷与测试实践的综合研究,揭示API误用是主要问题 |
large language model |
|
|
| 17 |
The Budget AI Researcher and the Power of RAG Chains |
提出基于RAG链的Budget AI Researcher框架,用于生成更具体、更有趣的科研idea。 |
large language model |
|
|
| 18 |
QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety |
提出QGuard以解决多模态LLM安全问题 |
large language model |
|
|
| 19 |
The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason |
揭示SWE-Bench的局限性:大型语言模型可能记忆而非推理 |
large language model |
|
|