| 16 |
CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models |
提出CL-HOI框架,利用视觉大语言模型蒸馏实现无需标注的人-物交互检测 |
distillation human-object interaction HOI |
|
|
| 17 |
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models |
LLaVA-KD:一种用于蒸馏多模态大语言模型的框架 |
distillation large language model multimodal |
✅ |
|
| 18 |
Few-shot target-driven instance detection based on open-vocabulary object detection models |
提出一种轻量级方法,利用开放词汇目标检测模型实现少样本目标驱动的实例检测。 |
world model open-vocabulary open vocabulary |
|
|
| 19 |
START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation |
提出基于显著性驱动的Token感知变换状态空间模型START,提升域泛化能力。 |
Mamba SSM state space model |
✅ |
|
| 20 |
MBPU: A Plug-and-Play State Space Model for Point Cloud Upsamping with Fast Point Rendering |
提出基于Mamba的MBPU网络,用于大规模点云上采样并减少伪影。 |
Mamba state space model |
|
|
| 21 |
Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identification |
提出SSSC-TransReID,增强Transformer在遮挡场景下行人重识别的特征表达能力 |
representation learning contrastive learning |
|
|
| 22 |
Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions |
Joker:基于条件扩散模型的三维头部极端表情合成 |
distillation NeRF neural radiance field |
|
|
| 23 |
YOLO11 and Vision Transformers based 3D Pose Estimation of Immature Green Fruits in Commercial Apple Orchards for Robotic Thinning |
提出基于YOLO11与Vision Transformer的苹果幼果三维姿态估计方法,用于机器人疏果 |
MAE depth estimation Depth Anything |
|
|
| 24 |
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset |
提出LMHaze大规模真实雾霾数据集,并设计MoE-Mamba模型提升图像去雾性能 |
Mamba multimodal |
|
|
| 25 |
Robust Visual Representation Learning with Multi-modal Prior Knowledge for Image Classification Under Distribution Shift |
提出知识引导的视觉表征学习方法KGV,提升图像分类在分布偏移下的泛化能力。 |
representation learning |
|
|
| 26 |
Learning from Neighbors: Category Extrapolation for Long-Tail Learning |
提出基于邻域学习的类别外推方法,解决长尾学习中尾部类别泛化性差的问题。 |
representation learning large language model |
|
|
| 27 |
Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation? |
提出类内监督的数据集蒸馏方法,显著压缩软标签大小并提升性能。 |
distillation |
✅ |
|
| 28 |
Contrastive Learning with Auxiliary User Detection for Identifying Activities |
提出CLAUDIA框架,通过辅助用户检测的对比学习提升用户和上下文感知的人类活动识别。 |
contrastive learning |
|
|
| 29 |
TIPS: Text-Image Pretraining with Spatial awareness |
提出TIPS以解决图像文本表示学习中的空间意识不足问题 |
representation learning depth estimation |
✅ |
|