| 1 |
Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset |
提出IMDD-1M大规模工业多模态缺陷数据集,用于开放词汇工业缺陷理解。 |
open-vocabulary open vocabulary foundation model |
✅ |
|
| 2 |
Improved 3D Gaussian Splatting of Unknown Spacecraft Structure Using Space Environment Illumination Knowledge |
提出基于太阳位置知识的3D高斯点云重建方法以应对动态光照问题 |
3D gaussian splatting 3DGS gaussian splatting |
|
|
| 3 |
ARM: A Learnable, Plug-and-Play Module for CLIP-based Open-vocabulary Semantic Segmentation |
提出ARM模块以解决CLIP基础的开放词汇语义分割问题 |
open-vocabulary open vocabulary foundation model |
|
|
| 4 |
Guided Diffusion-based Generation of Adversarial Objects for Real-World Monocular Depth Estimation Attacks |
提出基于扩散模型的对抗目标生成方法,提升单目深度估计攻击的真实性和有效性 |
depth estimation monocular depth physically plausible |
|
|
| 5 |
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention |
提出CERES框架,通过双模态因果干预解决Ego-RVOS中的偏差和混淆问题 |
metric depth egocentric |
|
|
| 6 |
Structure-Guided Allocation of 2D Gaussians for Image Representation and Compression |
提出结构引导的2D高斯分配方法,提升图像表示和压缩的率失真性能 |
gaussian splatting splatting |
|
|
| 7 |
PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing |
PipeFlow:面向长视频编辑的流水线处理和运动感知帧选择方法 |
optical flow |
|
|