cs.CV(2025-01-22)

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (11 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)

#题目一句话要点标签🔗
1 3DGS$^2$: Near Second-order Converging 3D Gaussian Splatting 提出3DGS$^2$,利用近二阶收敛算法加速3D高斯溅射训练,显著提升训练效率。 3D gaussian splatting 3DGS gaussian splatting
2 GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting GS-LiDAR:利用全景高斯溅射生成逼真的LiDAR点云,提升自动驾驶系统仿真效果。 gaussian splatting splatting NeRF
3 DynamicEarth: How Far are We from Open-Vocabulary Change Detection? 提出开放词汇变化检测以解决现有方法的局限性 open-vocabulary open vocabulary foundation model
4 Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks 提出多源辅助任务的单目深度估计方法,提升数据效率和深度预测质量。 depth estimation monocular depth foundation model
5 DWTNeRF: Boosting Few-shot Neural Radiance Fields via Discrete Wavelet Transform DWTNeRF:通过离散小波变换提升少样本神经辐射场性能 3DGS NeRF neural radiance field
6 Neural Radiance Fields for the Real World: A Survey 对真实世界神经辐射场(NeRF)的最新进展、应用与挑战进行全面综述 NeRF neural radiance field scene understanding
7 Sketch and Patch: Efficient 3D Gaussian Representation for Man-Made Scenes 针对人工场景,提出基于草图和补丁的高效3D高斯表示方法,显著降低存储需求。 3D gaussian splatting 3DGS gaussian splatting
8 MONA: Moving Object Detection from Videos Shot by Dynamic Camera 提出MONA框架,解决动态相机拍摄视频中的运动目标检测与分割问题 optical flow
9 Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation Orchid:用于联合生成外观和几何信息的图像潜在扩散模型 monocular depth
10 Machine Learning Modeling for Multi-order Human Visual Motion Processing 提出模拟V1-MT通路双路径模型,解决机器视觉中高阶人类视觉运动感知问题 optical flow
11 Separated Inter/Intra-Modal Fusion Prompts for Compositional Zero-Shot Learning 提出分离的模间/内融合提示学习方法,用于组合零样本学习 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
12 MEDFORM: A Foundation Model for Contrastive Learning of CT Imaging and Clinical Numeric Data in Multi-Cancer Analysis MEDFORM:用于多癌分析的CT影像和临床数值数据对比学习的基石模型 representation learning contrastive learning foundation model
13 TeD-Loc: Text Distillation for Weakly Supervised Object Localization 提出TeD-Loc,通过文本蒸馏实现弱监督目标定位,提升定位精度和效率。 distillation
14 DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning DocTTT:提出基于元辅助学习的手写文档识别测试时训练方法 masked autoencoder MAE

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
15 ViDDAR: Vision Language Model-Based Task-Detrimental Content Detection for Augmented Reality ViDDAR:基于视觉语言模型的增强现实任务有害内容检测系统 manipulation
16 3D Object Manipulation in a Single Image using Generative Models 提出OMG3D框架,结合几何控制与扩散模型,实现单张图像中3D物体逼真操控与动态生成。 manipulation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)

#题目一句话要点标签🔗
17 VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding VideoLLaMA3:面向图像和视频理解的前沿多模态基础模型 foundation model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页