| 1 |
DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model |
DARES:利用自监督Vector-LoRA改进机器人内窥镜手术中的Depth Anything模型 |
depth estimation monocular depth Depth Anything |
✅ |
|
| 2 |
Open-Vocabulary Action Localization with Iterative Visual Prompting |
提出基于迭代视觉提示的开放词汇动作定位方法,无需训练即可实现视频动作定位。 |
open-vocabulary open vocabulary |
✅ |
|
| 3 |
AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding |
AdaptVision:MLLM中动态输入缩放,用于多功能场景理解 |
scene understanding large language model multimodal |
✅ |
|
| 4 |
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios |
UrBench:一个综合性的多视角城市场景大模型评测基准 |
scene understanding multimodal |
|
|
| 5 |
Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms |
提出合成月球地形(SLT)多模态开放数据集,用于训练和评估神经形态视觉算法。 |
depth estimation multimodal |
|
|
| 6 |
OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping |
OG-Mapping:基于八叉树结构化3D高斯的在线稠密建图方法 |
3D gaussian splatting 3DGS gaussian splatting |
|
|
| 7 |
2DGH: 2D Gaussian-Hermite Splatting for High-quality Rendering and Better Geometry Reconstruction |
提出基于高斯-埃尔米特核的2D高斯溅射,提升渲染质量和几何重建效果 |
gaussian splatting splatting |
|
|
| 8 |
BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities |
BOP-Distrib:重新审视6D位姿估计基准,提升视觉歧义下的评估质量 |
6D pose estimation |
|
|
| 9 |
ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images |
提出ConDense框架以解决3D基础模型训练中的特征一致性问题 |
NeRF foundation model |
|
|