cs.CV(2025-12-11)

📊 共 39 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (25 🔗8) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗1) 支柱一:机器人控制 (Robot Control) (3) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (25 篇)

#题目一句话要点标签🔗
1 Breaking the Vicious Cycle: Coherent 3D Gaussian Splatting from Sparse and Motion-Blurred Views 提出CoherentGS,解决稀疏和运动模糊视图下的高保真3D高斯重建问题 3D gaussian splatting 3DGS gaussian splatting
2 Geo6DPose: Fast Zero-Shot 6D Object Pose Estimation via Geometry-Filtered Feature Matching Geo6DPose:基于几何滤波特征匹配的快速零样本6D物体姿态估计 pose estimation feature matching
3 Point2Pose: A Generative Framework for 3D Human Pose Estimation with Multi-View Point Cloud Dataset Point2Pose:提出一种基于多视角点云数据集的3D人体姿态估计生成框架 point cloud pose estimation
4 Empowering Dynamic Urban Navigation with Stereo and Mid-Level Vision StereoWalker:融合双目视觉与中层视觉增强动态城市导航 depth estimation navigation
5 Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching 提出Fast-FoundationStereo,实现零样本立体匹配的实时性与高精度。 stereo matching
6 SceneMaker: Open-set 3D Scene Generation with Decoupled De-occlusion and Pose Estimation Model SceneMaker:解耦去遮挡与姿态估计的开放场景三维生成框架 pose estimation
7 GaussianHeadTalk: Wobble-Free 3D Talking Heads with Audio Driven Gaussian Splatting 提出GaussianHeadTalk,利用音频驱动高斯溅射生成无抖动3D说话头 gaussian splatting
8 PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning PoseGAM:基于几何感知多视角推理的鲁棒未知物体姿态估计 pose estimation
9 Optimal transport unlocks end-to-end learning for single-molecule localization 利用最优传输实现单分子定位显微镜的端到端学习 localization
10 NaviHydra: Controllable Navigation-guided End-to-end Autonomous Driving with Hydra-distillation NaviHydra:基于Hydra蒸馏的可控导航引导端到端自动驾驶 navigation
11 Adaptive Dual-Weighted Gravitational Point Cloud Denoising Method 提出自适应双权重引力点云去噪方法,提升精度、效率与边缘保持能力 point cloud
12 RaLiFlow: Scene Flow Estimation with 4D Radar and LiDAR Point Clouds 提出RaLiFlow,首个基于4D雷达和激光雷达点云的场景流估计框架 point cloud
13 Efficient-VLN: A Training-Efficient Vision-Language Navigation Model Efficient-VLN:一种训练高效的视觉-语言导航模型,显著降低训练开销。 navigation
14 Physically Aware 360$^\circ$ View Generation from a Single Image using Disentangled Scene Embeddings 提出Disentangled360,通过解耦场景嵌入实现单图360度视图生成。 gaussian splatting NeRF scene reconstruction
15 E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training E-RayZer:提出自监督3D重建框架,作为空间视觉预训练模型。 pose estimation VGGT
16 An M-Health Algorithmic Approach to Identify and Assess Physiotherapy Exercises in Real Time 提出一种基于移动设备的M-Health算法,用于实时识别和评估理疗运动 pose estimation localization
17 THE-Pose: Topological Prior with Hybrid Graph Fusion for Estimating Category-Level 6D Object Pose THE-Pose:融合拓扑先验与混合图的类别级6D位姿估计 point cloud pose estimation
18 FloraForge: LLM-Assisted Procedural Generation of Editable and Analysis-Ready 3D Plant Geometric Models For Agricultural Applications FloraForge:LLM辅助生成可编辑、分析就绪的3D植物几何模型,用于农业应用 point cloud
19 OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis OmniView:用于3D和4D视图合成的统一扩散模型 novel view synthesis
20 Any4D: Unified Feed-Forward Metric 4D Reconstruction Any4D:统一前馈式度量4D重建框架 ego-motion
21 Video Depth Propagation 提出VeloDepth,通过时空先验和特征传播实现高效鲁棒的视频深度估计 depth estimation
22 Robust Shape from Focus via Multiscale Directional Dilated Laplacian and Recurrent Network 提出基于多尺度方向扩张拉普拉斯和循环网络的稳健Shape-from-Focus方法 depth estimation
23 Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment 提出双域渐进式时序对齐的无误差传播学习视频压缩框架 optical flow
24 CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates 提出基于场景图增量更新的纠错序列规划方法CoSPlan,提升VLM在复杂任务中的推理能力。 navigation
25 Long-LRM++: Preserving Fine Details in Feed-Forward Wide-Coverage Reconstruction Long-LRM++:结合半显式表达与轻量解码器,实现高质量、实时的宽覆盖场景重建。 gaussian splatting

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
26 TransLocNet: Cross-Modal Attention for Aerial-Ground Vehicle Localization with Contrastive Learning TransLocNet:基于跨模态注意力和对比学习的无人机-地面车辆定位 contrastive learning localization
27 Hybrid Transformer-Mamba Architecture for Weakly Supervised Volumetric Medical Segmentation 提出TranSamba,一种混合Transformer-Mamba架构,用于弱监督体积医学图像分割。 Mamba state space model localization
28 Weakly Supervised Tuberculosis Localization in Chest X-rays through Knowledge Distillation 利用知识蒸馏的胸部X光片肺结核弱监督定位方法 teacher-student localization
29 WorldLens: Full-Spectrum Evaluations of Driving World Models in Real World WorldLens:真实世界中驾驶世界模型的全方位评估基准 world model geometric consistency
30 VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation VDAWorld:提出基于VLM引导的抽象与模拟的世界建模框架 world model latent dynamics
31 Latent Chain-of-Thought World Modeling for End-to-End Driving 提出Latent-CoT-Drive,利用隐空间思维链进行端到端自动驾驶决策。 reinforcement learning world model
32 Grounding Everything in Tokens for Multimodal Large Language Models GETok:通过token化实现多模态大语言模型中的精确2D空间定位 reinforcement learning localization

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
33 XDen-1K: A Density Field Dataset of Real-World Objects XDen-1K:首个大规模真实物体密度场数据集,助力机器人操作和物理模拟。 manipulation
34 RobustSora: De-Watermarked Benchmark for Robust AI-Generated Video Detection RobustSora:提出去水印基准测试,评估AI生成视频检测的鲁棒性 manipulation
35 Feature Coding for Scalable Machine Vision 提出FCTM,通过特征编码显著降低机器视觉边缘部署的带宽需求。 running

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
36 IRG-MotionLLM: Interleaving Motion Generation, Assessment and Refinement for Text-to-Motion Generation 提出IRG-MotionLLM,通过交错运动生成、评估和优化,提升文本到动作生成效果 text-to-motion motion generation
37 Topology-Agnostic Animal Motion Generation from Text Prompt 提出OmniZoo数据集和拓扑无关的动物运动生成框架,解决异构骨骼和文本驱动的动物运动生成问题。 text-driven motion motion generation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
38 StereoSpace: Depth-Free Synthesis of Stereo Geometry via End-to-End Diffusion in a Canonical Space StereoSpace:提出一种基于扩散模型的无深度单目图像到立体图像生成框架 geometric consistency

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
39 3D Blood Pulsation Maps 提出Pulse3DFace数据集以解决3D血液脉动映射问题 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页