| 1 |
Frequency-Adaptive Sharpness Regularization for Improving 3D Gaussian Splatting Generalization |
提出频率自适应锐度正则化(FASR)以提升3D高斯溅射在少样本视角合成中的泛化能力 |
3D gaussian splatting 3DGS gaussian splatting |
✅ |
|
| 2 |
Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion |
提出基于视频扩散模型的零样本新视角合成方法,解决稀疏视角下的场景重建问题。 |
3D gaussian splatting gaussian splatting novel view synthesis |
|
|
| 3 |
ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization |
提出ARIAL框架,通过Agentic方式实现文档VQA的精确答案定位与抽取。 |
localization |
|
|
| 4 |
Plan-X: Instruct Video Generation via Semantic Planning |
Plan-X通过语义规划指导视频生成,显著减少视觉幻觉并提升指令对齐。 |
scene understanding human-object interaction |
|
|
| 5 |
Muskie: Multi-view Masked Image Modeling for 3D Vision Pre-training |
Muskie:面向3D视觉预训练的多视角掩码图像建模 |
pose estimation |
✅ |
|
| 6 |
AdaPerceiver: Transformers with Adaptive Width, Depth, and Tokens |
AdaPerceiver:提出首个在深度、宽度和tokens上自适应的Transformer架构。 |
depth estimation |
|
|
| 7 |
Spotlight: Identifying and Localizing Video Generation Errors Using VLMs |
Spotlight:利用视觉语言模型识别和定位视频生成错误 |
localization |
|
|
| 8 |
VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection |
VK-Det:视觉知识引导的原型学习用于开放词汇空中目标检测 |
localization |
|
|