| 11 |
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification |
提出数据自适应回溯(DAT)框架,提升视觉-语言基础模型在图像分类任务上的性能 |
contrastive learning foundation model |
|
|
| 12 |
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine |
MAVIS:利用自动数据引擎进行数学视觉指令调优,提升多模态大模型数学能力 |
DPO direct preference optimization contrastive learning |
✅ |
|
| 13 |
Emergent Visual-Semantic Hierarchies in Image-Text Representations |
研究发现CLIP等VLM模型具备涌现的视觉-语义层级理解能力,并提出Radial Embedding框架进行优化。 |
representation learning large language model foundation model |
|
|
| 14 |
VideoMamba: Spatio-Temporal Selective State Space Model |
VideoMamba:用于视频识别的时空选择性状态空间模型 |
Mamba SSM state space model |
|
|
| 15 |
SR-Mamba: Effective Surgical Phase Recognition with State Space Model |
SR-Mamba:利用状态空间模型实现高效的手术阶段识别 |
Mamba state space model |
✅ |
|
| 16 |
GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification |
提出GraphMamba,用于高效学习高光谱图像分类中的图结构和时序特征。 |
Mamba HSI |
|
|
| 17 |
SliceMamba with Neural Architecture Search for Medical Image Segmentation |
提出SliceMamba,结合神经架构搜索,提升医学图像分割性能 |
Mamba representation learning |
|
|
| 18 |
DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement |
DegustaBot:面向个性化多物体重排列的零样本视觉偏好估计 |
preference learning foundation model |
|
|
| 19 |
Exemplar-free Continual Representation Learning via Learnable Drift Compensation |
提出可学习漂移补偿以解决无样本持续表征学习问题 |
representation learning |
✅ |
|
| 20 |
FYI: Flip Your Images for Dataset Distillation |
提出FYI:通过图像翻转增强数据集蒸馏,提升小样本语义表达能力 |
distillation |
|
|