cs.CV(2025-12-03)

📊 共 29 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (14 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱一:机器人控制 (Robot Control) (5 🔗2) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (14 篇)

#题目一句话要点标签🔗
1 C3G: Learning Compact 3D Representations with 2K Gaussians C3G:使用2K高斯学习紧凑的3D表示,提升场景重建与理解 3D gaussian splatting gaussian splatting novel view synthesis
2 Motion4D: Learning 3D-Consistent Motion and Semantics for 4D Scene Understanding Motion4D:学习3D一致的运动和语义信息,用于4D场景理解 gaussian splatting novel view synthesis scene understanding
3 SyncTrack4D: Cross-Video Motion Alignment and Video Synchronization for Multi-Video 4D Gaussian Splatting SyncTrack4D:面向未同步多视角视频的4D高斯溅射动态场景重建。 gaussian splatting
4 Memory-Guided Point Cloud Completion for Dental Reconstruction 提出基于记忆引导的点云补全框架,用于牙科重建,提升补全精度。 point cloud
5 Mind-to-Face: Neural-Driven Photorealistic Avatar Synthesis via EEG Decoding Mind-to-Face:首个基于脑电信号解码的逼真人脸Avatar生成框架 3D gaussian splatting gaussian splatting
6 MVRoom: Controllable 3D Indoor Scene Generation with Multi-View Diffusion Models MVRoom:基于多视角扩散模型的可控3D室内场景生成 novel view synthesis
7 Emergent Outlier View Rejection in Visual Geometry Grounded Transformers 发现VGGT中隐含的离群点抑制能力,提升野外图像三维重建鲁棒性 VGGT
8 ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation 提出ReCamDriving,一种纯视觉相机控制的新轨迹视频生成框架 3DGS
9 Beyond Boundary Frames: Audio-Visual Semantic Guidance for Context-Aware Video Interpolation 提出BBF框架,利用音视频语义指导上下文感知的视频插帧 optical flow
10 GAOT: Generating Articulated Objects Through Text-Guided Diffusion Models GAOT:提出基于文本引导扩散模型的铰接物体生成框架 point cloud
11 CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on Cartographic Map Understanding CartoMapQA:提出用于评估视觉-语言模型地图理解能力的基础基准数据集。 navigation
12 OpenTrack3D: Towards Accurate and Generalizable Open-Vocabulary 3D Instance Segmentation OpenTrack3D:面向精确和泛化的开放词汇3D实例分割 point cloud
13 AfroBeats Dance Movement Analysis Using Computer Vision: A Proof-of-Concept Framework Combining YOLO and Segment Anything Model 提出结合YOLO和SAM的AfroBeats舞蹈动作分析框架,无需专业设备。 pose estimation
14 EEA: Exploration-Exploitation Agent for Long Video Understanding 提出EEA:一种用于长视频理解的探索-利用智能体框架 navigation

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
15 Cross-Stain Contrastive Learning for Paired Immunohistochemistry and Histopathology Slide Representation Learning 提出Cross-Stain Contrastive Learning框架,解决多染色病理切片表示学习中的对齐问题。 representation learning contrastive learning
16 TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning 提出TempR1,通过时序感知多任务强化学习提升MLLM对长视频的时序理解能力。 reinforcement learning localization
17 RELIC: Interactive Video World Model with Long-Horizon Memory RELIC:基于长时记忆的交互式视频世界模型,实现实时场景探索 world model
18 On the Temporality for Sketch Representation Learning 研究草图表示学习中时序性的影响,揭示最优建模方式。 representation learning
19 Traffic Image Restoration under Adverse Weather via Frequency-Aware Mamba 提出频率感知Mamba(FAMamba)用于恶劣天气下的交通图像恢复。 Mamba
20 Unique Lives, Shared World: Learning from Single-Life Videos 提出单一生涯学习范式,利用个体生活视频自监督学习通用视觉表征。 representation learning depth estimation

🔬 支柱一:机器人控制 (Robot Control) (5 篇)

#题目一句话要点标签🔗
21 Gamma-from-Mono: Road-Relative, Metric, Self-Supervised Monocular Geometry for Vehicular Applications 提出Gamma-from-Mono,用于车辆应用中道路相对、度量、自监督单目几何估计 motion planning depth estimation monocular depth
22 LAMP: Language-Assisted Motion Planning for Controllable Video Generation LAMP:利用语言辅助的运动规划实现可控视频生成 motion planning
23 SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL SpaceTools:通过双重交互强化学习增强工具辅助的空间推理能力 manipulation reinforcement learning
24 VAT: Vision Action Transformer by Unlocking Full Representation of ViT 提出Vision Action Transformer (VAT),充分利用ViT各层特征进行机器人动作学习。 manipulation imitation learning
25 PosA-VLA: Enhancing Action Generation via Pose-Conditioned Anchor Attention PosA-VLA:通过姿态条件锚点注意力增强具身任务中的动作生成 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
26 FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation FloodDiffusion:用于流式运动生成的定制扩散强制框架 motion generation
27 UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework UniMo:提出一个自回归框架,统一建模2D视频和3D人体运动,实现同步生成与理解。 motion token

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
28 Inference-time Stochastic Refinement of GRU-Normalizing Flow for Real-time Video Motion Transfer 提出GRU-SNF,通过推理时随机细化GRU-NF,实现实时视频运动迁移中多样性预测。 motion transfer

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
29 PULSE: A Unified Multi-Task Architecture for Cardiac Segmentation, Diagnosis, and Few-Shot Cross-Modality Clinical Adaptation PULSE:统一多任务架构,用于心脏分割、诊断和少样本跨模态临床自适应 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页