| 13 |
WATCH: World-aware Allied Trajectory and pose reconstruction for Camera and Human |
提出WATCH框架,解决单目视频中相机和人体全局运动轨迹精确重建问题 |
world model human motion human motion reconstruction |
|
|
| 14 |
PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting |
提出PromptEnhancer,通过思维链提示重写增强文本到图像生成模型。 |
reinforcement learning chain-of-thought |
|
|
| 15 |
VCMamba: Bridging Convolutions with Multi-Directional Mamba for Efficient Visual Representation |
VCMamba:融合卷积与多向Mamba,实现高效视觉表征 |
Mamba SSM state space model |
✅ |
|
| 16 |
SAC-MIL: Spatial-Aware Correlated Multiple Instance Learning for Histopathology Whole Slide Image Classification |
提出SAC-MIL,利用空间感知相关性多示例学习进行病理全切片图像分类。 |
SAC spatial relationship |
|
|
| 17 |
OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction |
OccTENS:通过时序下一尺度预测实现可控、高效的3D occupancy 世界模型生成。 |
world model spatial relationship |
|
|
| 18 |
3D and 4D World Modeling: A Survey |
对3D和4D世界建模与生成进行全面综述,填补了该领域系统性研究的空白。 |
world model occupancy grid |
✅ |
|
| 19 |
Guideline-Consistent Segmentation via Multi-Agent Refinement |
提出多代理精细化框架以解决语义分割中的指导一致性问题 |
reinforcement learning open-vocabulary open vocabulary |
|
|
| 20 |
Few-step Flow for 3D Generation via Marginal-Data Transport Distillation |
提出MDT-dist框架,通过边缘数据传输蒸馏加速3D生成模型的采样过程。 |
distillation |
|
|
| 21 |
MICACL: Multi-Instance Category-Aware Contrastive Learning for Long-Tailed Dynamic Facial Expression Recognition |
提出MICACL框架,解决长尾动态面部表情识别中的类别不平衡和时空建模问题。 |
contrastive learning |
|
|
| 22 |
Focus Through Motion: RGB-Event Collaborative Token Sparsification for Efficient Object Detection |
提出FocusMamba,通过RGB-Event协同Token稀疏化实现高效目标检测 |
Mamba multimodal |
✅ |
|