| 1 |
Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math |
提出 ScratchMath 基准,用于多模态大语言模型分析手写数学错误 |
large language model multimodal |
|
|
| 2 |
RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following |
RubricEval:针对LLM指令跟随能力评估的细粒度元评估基准 |
large language model instruction following |
|
|
| 3 |
CSI-tuples-based 3D Channel Fingerprints Construction Assisted by MultiModal Learning |
提出基于CSI元组和多模态学习的3D信道指纹构建框架,提升低空通信信道信息获取精度。 |
multimodal |
|
|
| 4 |
Is Mathematical Problem-Solving Expertise in Large Language Models Associated with Assessment Performance? |
研究表明LLM的数学问题解决能力与其评估学生解题步骤的准确性显著相关 |
large language model |
|
|
| 5 |
A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion |
提出步态基础模型,通过3D骨骼运动预测多系统健康表型 |
foundation model |
|
|
| 6 |
Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization |
提出基于大语言模型的拓扑优化自适应控制方法,提升SIMP算法性能。 |
large language model |
|
|
| 7 |
AD-CARE: A Guideline-grounded, Modality-agnostic LLM Agent for Real-world Alzheimer's Disease Diagnosis with Multi-cohort Assessment, Fairness Analysis, and Reader Study |
AD-CARE:基于指南、模态无关的LLM智能体,用于真实世界阿尔茨海默病诊断 |
large language model multimodal |
|
|
| 8 |
TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visualization |
TopoPilot:面向拓扑数据分析与可视化的可靠对话式工作流自动化框架 |
large language model |
|
|
| 9 |
SliderQuant: Accurate Post-Training Quantization for LLMs |
SliderQuant:面向LLM的精确后训练量化框架,提升不同层量化精度。 |
large language model |
|
|
| 10 |
WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing |
WebTestBench:用于评估计算机使用代理的端到端自动化Web测试基准 |
large language model |
✅ |
|
| 11 |
PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented Generation Systems |
提出PIDP-Attack,结合提示注入与数据库投毒攻击RAG系统 |
large language model |
|
|
| 12 |
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills |
Trace2Skill:通过轨迹局部学习提炼可迁移的Agent技能 |
large language model |
|
|
| 13 |
Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence |
系统性综述AI代码生成质量影响因素,揭示人机协作关键作用 |
large language model |
|
|
| 14 |
ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents |
ElephantBroker:用于可信AI代理的知识驱动认知运行时 |
large language model |
|
|
| 15 |
Sparse Visual Thought Circuits in Vision-Language Models |
探究视觉语言模型中稀疏视觉思维回路的可组合性与可控性 |
multimodal |
|
|
| 16 |
From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support |
LEKIA 2.0:构建情境感知的LLM心理支持系统,解决多轮对话中的状态缺失问题 |
large language model |
|
|
| 17 |
The Anatomy of Uncertainty in LLMs |
提出LLM不确定性分解框架,提升模型可靠性并检测幻觉 |
large language model |
|
|
| 18 |
Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-integrated programming learning system |
提出融合知识图谱的自适应与生成式AI编程学习反馈框架,提升学习效果。 |
large language model |
|
|
| 19 |
LogitScope: A Framework for Analyzing LLM Uncertainty Through Information Metrics |
LogitScope:通过信息度量分析LLM不确定性的轻量级框架 |
large language model |
|
|