| 1 |
Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline |
提出OmniFake数据集与UMFDet框架,统一解决社交媒体中人工与AI生成的多模态虚假信息检测问题。 |
multimodal chain-of-thought |
|
|
| 2 |
CHAI: Command Hijacking against embodied AI |
提出CHAI以解决对具身AI的命令劫持问题 |
embodied AI multimodal |
|
|
| 3 |
Emergent evaluation hubs in a decentralizing large language model ecosystem |
揭示大语言模型生态系统中评估基准的中心化趋势与影响 |
large language model foundation model |
|
|
| 4 |
Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination |
提出推理感知Prompt编排框架,用于多智能体语言模型协同推理。 |
large language model foundation model |
|
|
| 5 |
Drones that Think on their Feet: Sudden Landing Decisions with Embodied AI |
利用具身AI,无人机实现突发状况下的自主安全着陆决策 |
embodied AI |
|
|
| 6 |
CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search |
提出CoLLM-NAS,利用协同大语言模型进行高效的知识引导神经架构搜索 |
large language model |
|
|
| 7 |
Better with Less: Small Proprietary Models Surpass Large Language Models in Financial Transaction Understanding |
小规模金融交易专属模型超越大型语言模型,提升交易理解能力。 |
large language model |
|
|
| 8 |
BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models |
BiasBusters:揭示并缓解大语言模型中工具选择的偏差问题 |
large language model |
|
|
| 9 |
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always! |
OffTopicEval:评估大语言模型在错误场景下的安全性,揭示其泛化能力不足 |
large language model |
|
|
| 10 |
TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise |
TVS Sidekick:企业部署大语言模型的挑战与实践洞见 |
large language model |
|
|
| 11 |
STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models |
提出STaR-Attack框架,揭示并利用统一多模态模型在时空叙事推理上的安全漏洞。 |
multimodal |
|
|
| 12 |
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From |
提出SeedPrints以解决大语言模型归属验证问题 |
large language model |
|
|
| 13 |
AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations |
提出基于商业游戏模拟的LLM基准测试框架,评估其在动态管理决策中的能力 |
large language model |
|
|
| 14 |
MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models |
MEDAKA:利用大型语言模型构建生物医学知识图谱,提升药物安全与推荐。 |
large language model |
✅ |
|
| 15 |
Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research |
评估大型语言模型作为社会科学研究中合成社会代理的应用及注意事项 |
large language model |
|
|
| 16 |
DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models |
DeepJSONEval:提出用于评估LLM在复杂嵌套JSON数据挖掘能力的新基准 |
large language model |
✅ |
|
| 17 |
Galton's Law of Mediocrity: Why Large Language Models Regress to the Mean and Fail at Creativity in Advertising |
揭示大语言模型在广告创意中趋于平庸的“高尔顿定律”现象 |
large language model |
|
|
| 18 |
SOCK: A Benchmark for Measuring Self-Replication in Large Language Models |
SOCK:用于评估大型语言模型自我复制能力的标准基准 |
large language model |
|
|
| 19 |
90% Faster, 100% Code-Free: MLLM-Driven Zero-Code 3D Game Development |
UniGen:基于MLLM的零代码3D游戏开发框架,开发速度提升90%。 |
large language model multimodal |
✅ |
|
| 20 |
SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents |
提出SafeMindBench与SafeMindAgent,评估并缓解具身LLM智能体的安全风险。 |
large language model multimodal |
|
|
| 21 |
LLM-based Multi-Agent Blackboard System for Information Discovery in Data Science |
提出基于LLM的多智能体黑板系统,解决数据科学中信息发现难题。 |
large language model |
|
|
| 22 |
AgentFlux: Decoupled Fine-Tuning & Inference for On-Device Agentic Systems |
AgentFlux:解耦微调与推理,用于端侧Agent系统,提升工具调用准确率。 |
large language model |
|
|
| 23 |
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain |
提出Dragon Hatchling:一种受生物启发的、可解释的类Transformer语言模型 |
large language model |
|
|
| 24 |
Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents |
揭示自进化LLM Agent的Misevolution风险,提出系统性评估框架。 |
large language model |
✅ |
|
| 25 |
Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs |
Lita:轻量级Agent揭示LLM的Agentic编码能力 |
large language model |
|
|
| 26 |
Collaborative Compression for Large-Scale MoE Deployment on Edge |
提出协同压缩框架,实现超大MoE模型在边缘设备上的高效部署 |
large language model |
|
|
| 27 |
ICL Optimized Fragility |
ICL优化提升通用知识能力,但降低复杂推理的稳健性 |
chain-of-thought |
|
|
| 28 |
Data driven approaches in nanophotonics: A review of AI-enabled metadevices |
综述:AI驱动的纳米光子学,利用数据驱动方法设计超构器件 |
large language model |
|
|
| 29 |
Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework |
提出面向AI数据中心生命周期的TCO驱动框架,优化构建、刷新和运营阶段 |
large language model |
|
|
| 30 |
Communication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation |
提出FLoRA-NA以解决联邦低秩适应中的通信效率与准确性问题 |
foundation model |
|
|
| 31 |
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models |
提出Game-Time基准,评估会话语音语言模型的时间动态性 |
instruction following |
✅ |
|
| 32 |
Interactive Learning for LLM Reasoning |
提出ILR框架,通过交互式学习提升LLM独立推理能力 |
large language model |
|
|
| 33 |
SlimPack: Fine-Grained Asymmetric Packing for Balanced and Efficient Variable-Length LLM Training |
SlimPack:面向变长LLM训练的细粒度非对称数据打包,提升平衡性和效率 |
large language model |
|
|
| 34 |
Human-Centered Evaluation of RAG outputs: a framework and questionnaire for human-AI collaboration |
提出一套以人为中心的RAG输出评估框架与问卷,提升人机协作效果 |
large language model |
|
|
| 35 |
LLM Agents for Knowledge Discovery in Atomic Layer Processing |
利用LLM Agent在原子层处理中进行知识发现 |
large language model |
|
|
| 36 |
Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management |
提出一种无偏集体记忆框架,用于高效的基于LLM的Agent 6G跨域管理 |
large language model |
✅ |
|
| 37 |
'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs |
提出“厚输出”概念,平衡大语言模型中的文化对齐实践 |
large language model |
|
|
| 38 |
Judging by Appearances? Auditing and Intervening Vision-Language Models for Bail Prediction |
通过审计和干预视觉-语言模型,提升保释预测的公平性与准确性 |
large language model |
|
|
| 39 |
SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs |
提出SafeEvalAgent,实现LLM安全评估的自主进化与动态基准生成 |
large language model |
|
|
| 40 |
Accelerating LLM Inference with Precomputed Query Storage |
StorInfer:利用预计算查询存储加速LLM推理,尤其适用于资源受限环境 |
large language model |
|
|
| 41 |
Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search |
Chain-in-Tree:通过动态分支策略提升LLM树搜索效率 |
large language model |
✅ |
|
| 42 |
HNote: Extending YNote with Hexadecimal Encoding for Fine-Tuning LLMs in Music Modeling |
提出HNote:一种基于十六进制编码的音乐表示方法,用于微调LLM进行音乐建模 |
large language model |
|
|
| 43 |
CustomIR: Unsupervised Fine-Tuning of Dense Embeddings for Known Document Corpora |
CustomIR:利用无监督微调提升领域文档语料库的稠密嵌入效果 |
large language model |
|
|