cs.CV(2025-04-17)

📊 共 27 篇论文 | 🔗 12 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗4) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs 提出基于超点图的无训练高斯溅射场景理解框架,提升语义一致性与效率。 3D gaussian splatting 3DGS gaussian splatting
2 Stronger, Steadier & Superior: Geometric Consistency in Depth VFM Forges Domain Generalized Semantic Segmentation DepthForge:融合深度信息的VFM提升域泛化语义分割的几何一致性 Depth Anything geometric consistency foundation model
3 Digital Twin Generation from Visual Data: A Survey 综述:基于视觉数据的数字孪生生成技术研究进展 3D gaussian splatting gaussian splatting splatting
4 ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos 提出ODHSR,实现单目视频中人体与场景的在线稠密3D重建 3D gaussian splatting gaussian splatting splatting
5 Matrix-free Second-order Optimization of Gaussian Splats with Residual Sampling 提出基于残差采样的矩阵无关二阶优化方法,加速3D高斯溅射训练。 3D gaussian splatting 3DGS gaussian splatting
6 Perception Encoder: The best visual embeddings are not at the output of the network 提出感知编码器(PE),通过视觉-语言对比学习获得图像和视频理解的最佳视觉嵌入。 depth estimation multimodal
7 AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis AerialMegaDepth:学习空中-地面重建与视角合成,解决视角差异过大问题。 scene reconstruction
8 SC3EF: A Joint Self-Correlation and Cross-Correspondence Estimation Framework for Visible and Thermal Image Registration 提出SC3EF框架,解决可见光与热成像配准中的跨模态差异问题 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
9 LAD-Reasoner: Tiny Multimodal Models are Good Reasoners for Logical Anomaly Detection LAD-Reasoner:提出一种基于小型多模态模型的逻辑异常检测方法 multimodal chain-of-thought
10 WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada WildFireCan-MMD:提出加拿大野火期间用户生成内容分类的多模态数据集 multimodal
11 Computer-Aided Design of Personalized Occlusal Positioning Splints Using Multimodal 3D Data 提出一种基于多模态3D数据的个性化咬合定位颌垫计算机辅助设计方法 multimodal
12 High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion 提出MMInvertFill,通过多模态引导GAN反演实现高保真图像修复 multimodal
13 TongUI: Internet-Scale Trajectories from Multimodal Web Tutorials for Generalized GUI Agents TongUI:利用多模态Web教程构建通用GUI代理,解决轨迹数据匮乏难题 multimodal
14 SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding SmartFreeEdit:提出一种无需掩码、空间感知、复杂指令理解的图像编辑框架 large language model multimodal
15 LIFT+: Lightweight Fine-Tuning for Long-Tail Learning 提出LIFT+轻量级微调框架,解决长尾学习中重微调导致性能下降的问题。 foundation model
16 Vision and Language Integration for Domain Generalization 提出VLCA,利用视觉-语言融合弥合领域差异,提升领域泛化能力 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
17 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models VistaDPO:提出视频分层时空直接偏好优化方法,提升大型视频模型性能。 direct preference optimization large language model TAMP
18 Post-pre-training for Modality Alignment in Vision-Language Foundation Models 提出CLIP-Refine,通过后预训练对齐视觉-语言模型中的模态特征空间 distillation foundation model
19 Low-hallucination Synthetic Captions for Large-Scale Vision-Language Model Pre-training 提出低幻觉合成字幕生成方法,用于大规模视觉-语言模型预训练。 DPO large language model multimodal
20 Hierarchical Feature Learning for Medical Point Clouds via State Space Model 提出基于状态空间模型的医学点云分层特征学习框架,提升解剖结构分类、补全和分割性能。 SSM state space model
21 SkyReels-V2: Infinite-length Film Generative Model SkyReels-V2:提出无限长度电影生成模型,解决长视频生成中prompt一致性、视觉质量和运动动态的难题。 reinforcement learning large language model
22 EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance EchoWorld:学习运动感知世界模型,用于超声心动图探头引导 world model
23 Enhancing Cocoa Pod Disease Classification via Transfer Learning and Ensemble Methods: Toward Robust Predictive Modeling 结合迁移学习与集成方法,提升可可豆荚疾病分类的鲁棒性 predictive model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
24 TSGS: Improving Gaussian Splatting for Transparent Surface Reconstruction via Normal and De-lighting Priors TSGS:通过法线和去光照先验改进高斯溅射透明表面重建 manipulation depth estimation 3D gaussian splatting
25 Fully Unified Motion Planning for End-to-End Autonomous Driving 提出FUMP,通过统一学习自车与他车数据,提升端到端自动驾驶运动规划性能。 motion planning

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
26 EventVAD: Training-Free Event-Aware Video Anomaly Detection EventVAD:一种免训练的事件感知视频异常检测框架 spatiotemporal large language model multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
27 TwoSquared: 4D Generation from 2D Image Pairs 提出TwoSquared以解决4D动态物体生成问题 physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页