| 1 |
Hallucination-Aware Multimodal Benchmark for Gastrointestinal Image Analysis with Large Vision-Language Models |
提出Gut-VLM数据集,并采用幻觉感知微调方法提升VLM在胃肠道图像分析中的可靠性。 |
multimodal |
✅ |
|
| 2 |
Seed1.5-VL Technical Report |
提出Seed1.5-VL,用于提升通用多模态理解和推理能力 |
foundation model multimodal |
|
|
| 3 |
Building a Human-Verified Clinical Reasoning Dataset via a Human LLM Hybrid Pipeline for Trustworthy Medical AI |
构建基于人机混合流程的临床推理数据集,提升医疗AI的可信度 |
large language model chain-of-thought |
|
|
| 4 |
Visual Instruction Tuning with Chain of Region-of-Interest |
提出基于感兴趣区域链的视觉指令调优方法CoRoI,提升高分辨率图像多模态大模型的效率。 |
large language model multimodal |
|
|
| 5 |
DeepSORT-Driven Visual Tracking Approach for Gesture Recognition in Interactive Systems |
利用DeepSORT的视觉跟踪方法用于交互系统中手势识别 |
multimodal |
|
|
| 6 |
DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models |
DAPE:双阶段参数高效微调框架,用于扩散模型视频一致性编辑 |
multimodal |
|
|
| 7 |
Efficient and Robust Multidimensional Attention in Remote Physiological Sensing through Target Signal Constrained Factorization |
提出TSFM约束的多维注意力机制,提升远程生理信号感知的跨域泛化能力 |
multimodal |
✅ |
|