cs.CV(2025-12-15)

📊 共 24 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (11 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (9 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (11 篇)

#题目一句话要点标签🔗
1 StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion StarryGazer:利用单目深度估计模型实现领域无关的单深度图像补全 depth estimation monocular depth
2 Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries 提出基于神经纹理Surfel的新视角合成方法,在稀疏几何下实现实时渲染。 3D gaussian splatting gaussian splatting novel view synthesis
3 Charge: A Comprehensive Novel View Synthesis Benchmark and Dataset to Bind Them All 提出Charge数据集,用于高质量新视角合成的综合基准测试。 novel view synthesis scene reconstruction optical flow
4 Computer vision training dataset generation for robotic environments using Gaussian splatting 提出基于高斯溅射的机器人环境计算机视觉训练数据集生成流程 3D gaussian splatting 3DGS gaussian splatting
5 MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion MMDrive:提出多模态融合的交互式场景理解框架,超越视觉局限 scene understanding point cloud
6 TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading 提出TWLR框架,利用文本引导的弱监督学习进行糖尿病视网膜病变分级与病灶定位。 localization
7 LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction 提出LASER以解决流媒体4D重建中的训练需求问题 pose estimation VGGT
8 LitePT: Lighter Yet Stronger Point Transformer LitePT:一种更轻量但更强大的点云Transformer,通过卷积与注意力机制的有效结合提升性能。 point cloud
9 I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners I-Scene:利用预训练3D实例生成器实现可泛化的隐式场景空间学习 scene understanding
10 DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass DePT3R:单次前向传播实现动态场景的联合稠密点追踪与3D重建 scene understanding
11 VoroLight: Learning Quality Volumetric Voronoi Meshes from General Inputs VoroLight:提出基于可微Voronoi图的通用输入三维形状重建框架 point cloud

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
12 Motus: A Unified Latent Action World Model 提出Motus以解决多模态生成能力统一问题 world model optical flow
13 Recurrent Video Masked Autoencoders 提出RVM:一种基于Transformer循环神经网络的视频掩码自编码器,用于高效视频表征学习。 representation learning masked autoencoder
14 MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning MindDrive:提出基于在线强化学习的视觉-语言-动作模型,用于自动驾驶。 reinforcement learning imitation learning
15 Self-Supervised Ultrasound Representation Learning for Renal Anomaly Prediction in Prenatal Imaging 提出基于自监督学习的USF-MAE模型,用于产前超声肾脏异常自动预测。 representation learning MAE
16 SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning 提出SAGE,利用强化学习训练智能任意时域Agent,用于长视频推理。 reinforcement learning
17 LongVie 2: Multimodal Controllable Ultra-Long Video World Model LongVie 2:多模态可控超长视频世界模型,实现高质量长时序视频生成。 world model
18 ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning ADHint:利用难度先验的自适应提示强化学习,提升推理能力和泛化性 reinforcement learning
19 RecTok: Reconstruction Distillation along Rectified Flow RecTok:通过校正流上的重构蒸馏,突破高维视觉Tokenizers的性能瓶颈 flow matching classifier-free guidance
20 AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection AgentIAD:工具增强的单智能体工业异常检测框架 reinforcement learning reward design

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
21 MoLingo: Motion-Language Alignment for Text-to-Motion Generation MoLingo:通过运动-语言对齐实现文本到动作生成,达到新的SOTA。 text-to-motion motion generation motion latent

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
22 Grab-3D: Detecting AI-Generated Videos from 3D Geometric Temporal Consistency 提出Grab-3D,利用3D几何时序一致性检测AI生成视频 geometric consistency

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
23 3D Human-Human Interaction Anomaly Detection 提出IADNet,用于检测3D人体交互中的异常行为 collaborative motion

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
24 KlingAvatar 2.0 Technical Report 提出KlingAvatar 2.0以解决长视频生成中的效率与一致性问题 character control

⬅️ 返回 cs.CV 首页 · 🏠 返回主页