| 1 |
Cross-Attention Multimodal Fusion for Breast Cancer Diagnosis: Integrating Mammography and Clinical Data with Explainability |
提出基于交叉注意力多模态融合的乳腺癌诊断方法,提升诊断精度与可解释性。 |
multimodal |
|
|
| 2 |
Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping |
PathPT:通过少样本提示调优增强病理学基础模型,用于罕见癌症亚型分类 |
foundation model |
|
|
| 3 |
Deep Learning-Driven Multimodal Detection and Movement Analysis of Objects in Culinary |
提出基于深度学习的多模态融合烹饪对象检测与动作分析方法 |
multimodal |
|
|
| 4 |
BasketLiDAR: The First LiDAR-Camera Multimodal Dataset for Professional Basketball MOT |
BasketLiDAR:首个篮球多目标跟踪LiDAR-相机多模态数据集与实时跟踪框架 |
multimodal |
|
|
| 5 |
AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation |
AeroDuo:提出双无人机协同视觉语言导航框架,解决复杂环境下的无人机导航问题 |
VLN large language model multimodal |
|
|
| 6 |
StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding |
StreamMem:面向流视频理解的查询无关KV缓存记忆机制 |
large language model multimodal |
|
|
| 7 |
RCDINO: Enhancing Radar-Camera 3D Object Detection with DINOv2 Semantic Features |
RCDINO:利用DINOv2语义特征增强雷达-相机3D目标检测 |
foundation model multimodal |
✅ |
|
| 8 |
VT-LVLM-AR: A Video-Temporal Large Vision-Language Model Adapter for Fine-Grained Action Recognition in Long-Term Videos |
提出VT-LVLM-AR,利用视频-时间大视觉语言模型适配器解决长时视频中的细粒度动作识别问题 |
large language model |
|
|
| 9 |
SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass |
SceneGen:单图像前向传播的3D场景生成框架 |
embodied AI |
✅ |
|
| 10 |
LLM-empowered Dynamic Prompt Routing for Vision-Language Models Tuning under Long-Tailed Distributions |
提出MDPR框架,解决长尾分布下VLM微调的偏差累积问题 |
large language model |
|
|
| 11 |
When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding |
Grounded VideoDiT:融合扩散模型与实体感知的视频LLM,提升长视频理解能力 |
TAMP |
|
|
| 12 |
First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection |
提出RAG-SEG免训练框架,解决伪装目标检测中prompt生成难题,实现高性能与高效率。 |
foundation model |
✅ |
|
| 13 |
Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent |
提出Comp-X,利用专家驱动的LLM Agent实现智能交互式图像压缩。 |
large language model |
|
|