| 1 |
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models |
提出多模态基准以解决医疗图像分析中的幻觉问题 |
multimodal |
✅ |
|
| 2 |
Seed1.5-VL Technical Report |
提出Seed1.5-VL以解决多模态理解与推理问题 |
foundation model multimodal |
|
|
| 3 |
Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI |
提出人类验证的临床推理数据集以解决医疗AI信任问题 |
large language model chain-of-thought |
|
|
| 4 |
Visual Instruction Tuning with Chain of Region-of-Interest |
提出Chain of Region-of-Interest以解决高分辨率图像计算负担问题 |
large language model multimodal |
|
|
| 5 |
DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems |
基于DeepSORT的视觉跟踪方法解决交互系统中的手势识别问题 |
multimodal |
|
|
| 6 |
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models |
提出DAPE以解决视频编辑中的一致性与效率问题 |
multimodal |
|
|
| 7 |
Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained Factorization |
提出目标信号约束因子分解以解决远程生理信号监测的鲁棒性问题 |
multimodal |
✅ |
|