cs.CV（2025-12-15）

📊 共 24 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三：空间感知 (Perception & SLAM) (11 🔗5) 支柱二：RL算法与架构 (RL & Architecture) (9 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知 (Perception & SLAM) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	StarryGazer: Leveraging Monocular Depth Estimation Models for Domain-Agnostic Single Depth Image Completion	StarryGazer：利用单目深度估计模型实现领域无关的单深度图像补全	depth estimation monocular depth
2	Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries	提出基于神经纹理Surfel的新视角合成方法，在稀疏几何下实现实时渲染。	3D gaussian splatting gaussian splatting novel view synthesis
3	Charge: A Comprehensive Novel View Synthesis Benchmark and Dataset to Bind Them All	提出Charge数据集，用于高质量新视角合成的综合基准测试。	novel view synthesis scene reconstruction optical flow
4	Computer vision training dataset generation for robotic environments using Gaussian splatting	提出基于高斯溅射的机器人环境计算机视觉训练数据集生成流程	3D gaussian splatting 3DGS gaussian splatting
5	MMDrive: Interactive Scene Understanding Beyond Vision with Multi-representational Fusion	MMDrive：提出多模态融合的交互式场景理解框架，超越视觉局限	scene understanding point cloud
6	TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading	提出TWLR框架，利用文本引导的弱监督学习进行糖尿病视网膜病变分级与病灶定位。	localization
7	LASER: Layer-wise Scale Alignment for Training-Free Streaming 4D Reconstruction	提出LASER以解决流媒体4D重建中的训练需求问题	pose estimation VGGT	✅
8	LitePT: Lighter Yet Stronger Point Transformer	LitePT：一种更轻量但更强大的点云Transformer，通过卷积与注意力机制的有效结合提升性能。	point cloud	✅
9	I-Scene: 3D Instance Models are Implicit Generalizable Spatial Learners	I-Scene：利用预训练3D实例生成器实现可泛化的隐式场景空间学习	scene understanding	✅
10	DePT3R: Joint Dense Point Tracking and 3D Reconstruction of Dynamic Scenes in a Single Forward Pass	DePT3R：单次前向传播实现动态场景的联合稠密点追踪与3D重建	scene understanding	✅
11	VoroLight: Learning Quality Volumetric Voronoi Meshes from General Inputs	VoroLight：提出基于可微Voronoi图的通用输入三维形状重建框架	point cloud	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Motus: A Unified Latent Action World Model	提出Motus以解决多模态生成能力统一问题	world model optical flow
13	Recurrent Video Masked Autoencoders	提出RVM：一种基于Transformer循环神经网络的视频掩码自编码器，用于高效视频表征学习。	representation learning masked autoencoder
14	MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning	MindDrive：提出基于在线强化学习的视觉-语言-动作模型，用于自动驾驶。	reinforcement learning imitation learning
15	Self-Supervised Ultrasound Representation Learning for Renal Anomaly Prediction in Prenatal Imaging	提出基于自监督学习的USF-MAE模型，用于产前超声肾脏异常自动预测。	representation learning MAE
16	SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning	提出SAGE，利用强化学习训练智能任意时域Agent，用于长视频推理。	reinforcement learning
17	LongVie 2: Multimodal Controllable Ultra-Long Video World Model	LongVie 2：多模态可控超长视频世界模型，实现高质量长时序视频生成。	world model
18	ADHint: Adaptive Hints with Difficulty Priors for Reinforcement Learning	ADHint：利用难度先验的自适应提示强化学习，提升推理能力和泛化性	reinforcement learning
19	RecTok: Reconstruction Distillation along Rectified Flow	RecTok：通过校正流上的重构蒸馏，突破高维视觉Tokenizers的性能瓶颈	flow matching classifier-free guidance	✅
20	AgentIAD: Tool-Augmented Single-Agent for Industrial Anomaly Detection	AgentIAD：工具增强的单智能体工业异常检测框架	reinforcement learning reward design

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	MoLingo: Motion-Language Alignment for Text-to-Motion Generation	MoLingo：通过运动-语言对齐实现文本到动作生成，达到新的SOTA。	text-to-motion motion generation motion latent

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Grab-3D: Detecting AI-Generated Videos from 3D Geometric Temporal Consistency	提出Grab-3D，利用3D几何时序一致性检测AI生成视频	geometric consistency

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	3D Human-Human Interaction Anomaly Detection	提出IADNet，用于检测3D人体交互中的异常行为	collaborative motion

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	KlingAvatar 2.0 Technical Report	提出KlingAvatar 2.0以解决长视频生成中的效率与一致性问题	character control

⬅️ 返回 cs.CV 首页 · 🏠 返回主页