| 1 |
Multi-modal On-Device Learning for Monocular Depth Estimation on Ultra-low-power MCUs |
提出一种多模态片上学习方法,用于超低功耗MCU上的单目深度估计。 |
depth estimation monocular depth |
|
|
| 2 |
Unlocking Zero-shot Potential of Semi-dense Image Matching via Gaussian Splatting |
MatchGS:利用高斯溅射解锁半稠密图像匹配的零样本潜力 |
3D gaussian splatting 3DGS gaussian splatting |
|
|
| 3 |
HTTM: Head-wise Temporal Token Merging for Faster VGGT |
提出头部分时序Token合并(HTTM)加速VGGT,用于快速3D场景重建 |
scene reconstruction VGGT |
|
|
| 4 |
PathReasoning: A multimodal reasoning agent for query-based ROI navigation on whole-slide images |
PathReasoning:一种用于全切片图像上基于查询的ROI导航的多模态推理Agent |
navigation |
|
|
| 5 |
SurgMLLMBench: A Multimodal Large Language Model Benchmark Dataset for Surgical Scene Understanding |
SurgMLLMBench:用于手术场景理解的多模态大语言模型基准数据集 |
scene understanding |
|
|
| 6 |
Endo-G$^{2}$T: Geometry-Guided & Temporally Aware Time-Embedded 4DGS For Endoscopic Scenes |
Endo-G²T:针对内窥镜场景,提出几何引导和时序感知的时序嵌入4D高斯溅射方法 |
monocular depth gaussian splatting |
|
|
| 7 |
Scenes as Tokens: Multi-Scale Normal Distributions Transform Tokenizer for General 3D Vision-Language Understanding |
提出NDTokenizer3D,用于通用3D视觉-语言理解的多尺度NDT Tokenizer |
scene understanding point cloud |
|
|
| 8 |
FaithFusion: Harmonizing Reconstruction and Generation via Pixel-wise Information Gain |
FaithFusion:提出基于像素级信息增益的3DGS-扩散融合框架,解决可控驾驶场景重建与生成问题。 |
3DGS scene reconstruction |
✅ |
|
| 9 |
AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views |
提出AmodalGen3D以解决稀疏视角下的3D物体重建问题 |
scene reconstruction |
|
|
| 10 |
MoGAN: Improving Motion Quality in Video Diffusion via Few-Step Motion Adversarial Post-Training |
MoGAN:通过少量步数的运动对抗后训练提升视频扩散模型的运动质量 |
optical flow |
✅ |
|