| 1 |
Qumus: Realization of An Embodied AI Quantum Material Experimentalist |
Qumus:实现具身AI量子材料实验家,首次AI创建石墨烯和纳米器件。 |
embodied AI large language model multimodal |
|
|
| 2 |
SVFSearch: A Multimodal Knowledge-Intensive Benchmark for Short-Video Frame Search in the Gaming Vertical Domain |
提出SVFSearch:一个面向游戏短视频帧搜索的多模态知识密集型基准 |
large language model multimodal visual grounding |
|
|
| 3 |
Visualizing the Invisible: Generative Visual Grounding Empowers Universal EEG Understanding in MLLMs |
提出生成式视觉 grounding (GVG) 框架,提升 MLLM 对脑电信号的理解能力 |
foundation model visual grounding |
|
|
| 4 |
Safety Geometry Collapse in Multimodal LLMs and Adaptive Drift Correction |
针对多模态LLM安全几何坍塌问题,提出自适应漂移校正方法ReGap |
large language model multimodal |
|
|
| 5 |
Estimating Item Difficulty with Large Language Models as Experts |
利用大型语言模型作为专家评估项目难度,无需响应数据。 |
large language model |
|
|
| 6 |
TeleCom-Bench: How Far Are Large Language Models from Industrial Telecommunication Applications? |
TeleCom-Bench:评估大语言模型在工业电信应用中的能力差距,并提供领域对齐指导。 |
large language model |
✅ |
|
| 7 |
TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training |
TierCheck:面向大语言模型训练的异构容错分层检查点系统 |
large language model |
|
|
| 8 |
DuIVRS-2: An LLM-based Interactive Voice Response System for Large-scale POI Attribute Acquisition |
DuIVRS-2:基于LLM的大规模POI属性获取交互式语音应答系统 |
large language model chain-of-thought |
|
|
| 9 |
Evaluating Cognitive Age Alignment in Interactive AI Agents |
提出ChildAgentEval,评估交互式AI智能体认知年龄对齐程度 |
large language model multimodal |
|
|
| 10 |
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation |
提出Prompt2Fingerprint,通过文本到权重的生成实现即插即用的LLM指纹识别。 |
large language model |
|
|
| 11 |
Guard: Scalable Straggler Detection and Node Health Management for Large-Scale Training |
Guard:用于大规模训练的可扩展Straggler检测和节点健康管理系统 |
foundation model |
|
|
| 12 |
Democratizing Large-Scale Re-Optimization with LLM-Guided Model Patches |
提出基于LLM引导模型补丁的大规模重优化框架,赋能非专家用户。 |
large language model |
|
|
| 13 |
SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science |
SCICONVBENCH:用于评估LLM在计算科学中多轮澄清任务构建能力的基准测试。 |
large language model |
✅ |
|
| 14 |
Prompts Don't Protect: Architectural Enforcement via MCP Proxy for LLM Tool Access Control |
提出MCP代理架构,通过强制访问控制保障LLM工具使用的安全性 |
large language model |
|
|
| 15 |
QSTRBench: a New Benchmark to Evaluate the Ability of Language Models to Reason with Qualitative Spatial and Temporal Calculi |
提出QSTRBench以评估语言模型的空间与时间推理能力 |
large language model |
|
|
| 16 |
The Hidden Cost of Contextual Sycophancy: an AI Literacy Intervention in Human-AI Collaboration |
研究揭示LLM在人机协作中存在语境性谄媚问题,并探讨AI素养干预的有效性 |
large language model |
|
|
| 17 |
Evidence-Grounded Frontier Mapping and Agentic Hypothesis Generation in Nanomedicine |
pArticleMap:一种基于证据的纳米医学前沿探索与假设生成系统 |
large language model |
|
|
| 18 |
Generative AI and the Productivity Divide: Human-AI Complementarities in Education |
研究表明,生成式AI在教育中生产力提升存在差异,AI交互能力是关键。 |
large language model |
|
|
| 19 |
A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback |
A-ProS:通过多模型反馈实现可靠的自主编程 |
large language model |
|
|
| 20 |
Babel: Jailbreaking Safety Attention via Obfuscation Distribution Optimized Sampling |
Babel:通过优化混淆分布采样破解安全注意力机制 |
large language model |
|
|
| 21 |
Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective |
基于交互视角,揭示SFT在LLM中效果不一致的原因并提供训练指导 |
large language model |
|
|
| 22 |
BLAgent: Agentic RAG for File-Level Bug Localization |
BLAgent:面向文件级缺陷定位的Agentic RAG框架 |
large language model |
|
|
| 23 |
Agentic Chunking and Bayesian De-chunking of AI Generated Fuzzy Cognitive Maps: A Model of the Thucydides Trap |
提出基于LLM Agent的FCM自动构建与贝叶斯解耦方法,用于分析大国冲突。 |
large language model |
|
|
| 24 |
Interactive Evaluation Requires a Design Science |
设计科学视角下的交互式评估框架,应对LLM在复杂环境中的评估挑战。 |
large language model |
|
|