| 1 |
MASTER: Multimodal Segmentation with Text Prompts |
提出MASTER:利用文本提示的多模态分割框架,提升复杂场景下的RGB-Thermal融合性能 |
large language model multimodal |
|
|
| 2 |
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks |
提出PP-DocBee以解决文档图像理解问题 |
large language model multimodal |
✅ |
|
| 3 |
Leveraging Large Language Models For Scalable Vector Graphics Processing: A Review |
综述:利用大型语言模型处理可缩放矢量图形 |
large language model |
|
|
| 4 |
Adaptive Prototype Learning for Multimodal Cancer Survival Analysis |
提出自适应原型学习(APL)方法,用于多模态癌症生存分析,提升预测精度。 |
multimodal |
✅ |
|
| 5 |
DuCos: Duality Constrained Depth Super-Resolution via Foundation Model |
DuCos:基于基础模型和拉格朗日对偶的深度超分辨率方法 |
foundation model |
|
|
| 6 |
The Role of Visual Modality in Multimodal Mathematical Reasoning: Challenges and Insights |
揭示视觉模态在多模态数学推理中的作用,并提出HC-M3D数据集以增强视觉依赖 |
multimodal |
✅ |
|
| 7 |
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement |
FirePlace:结合几何约束与常识推理的3D物体放置框架 |
large language model multimodal |
|
|
| 8 |
DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation |
DSV-LFS:融合LLM语义提示与视觉特征,提升小样本分割的鲁棒性 |
large language model multimodal |
✅ |
|
| 9 |
RetinalGPT: A Retinal Clinical Preference Conversational Assistant Powered by Large Vision-Language Models |
RetinalGPT:基于大型视觉语言模型的视网膜临床偏好对话助手 |
large language model multimodal |
✅ |
|
| 10 |
Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton Information |
Gate-Shift-Pose:融合骨骼信息的运动动作识别方法,提升花样滑冰摔倒检测精度 |
multimodal |
✅ |
|
| 11 |
TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction |
提出跨时序预测连接(TPC)以降低视觉-语言模型幻觉 |
large language model |
|
|
| 12 |
ToFu: Visual Tokens Reduction via Fusion for Multi-modal, Multi-patch, Multi-image Task |
ToFu:一种视觉令牌融合方法,用于提升多模态、多图像任务的效率。 |
multimodal |
|
|