cs.CV(2025-12-12)

📊 共 33 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (20 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5) 支柱一:机器人控制 (Robot Control) (4) 支柱四:生成式动作 (Generative Motion) (2 🔗2) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (20 篇)

#题目一句话要点标签🔗
1 Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance 提出基于矩的3D高斯溅射,通过与顺序无关的透射率解决体积遮挡问题 3D gaussian splatting 3DGS gaussian splatting
2 Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video 提出先验增强的高斯溅射方法,用于从日常视频中重建动态场景 gaussian splatting scene reconstruction
3 Lightweight 3D Gaussian Splatting Compression via Video Codec 提出基于视频编解码器的轻量级3D高斯溅射压缩方法,适用于轻量级设备。 3D gaussian splatting gaussian splatting
4 MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction 提出MultiEgo:用于4D场景重建的多视角第一人称视频数据集 scene reconstruction social interaction
5 Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using LiDAR HD Reference Data across Metropolitan France 提出THREASURE-Net,利用Sentinel-2时间序列和LiDAR数据进行高分辨率森林冠层高度制图。 height map
6 On Geometric Understanding and Learned Data Priors in VGGT 分析VGGT几何理解能力:揭示其隐式几何学习与数据先验依赖 VGGT
7 Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization 提出扩展时序位移模块的多任务学习方法,用于时序动作定位 localization
8 Exploring Spatial-Temporal Representation via Star Graph for mmWave Radar-based Human Activity Recognition 提出基于星型图的离散动态图神经网络,用于毫米波雷达人体活动识别 point cloud
9 Particulate: Feed-Forward 3D Object Articulation Particulate:提出一种前馈3D物体关节运动估计方法,无需逐对象优化。 point cloud
10 Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation 提出SAM2VideoX,通过蒸馏结构保持运动先验,提升视频生成质量。 optical flow
11 Depth-Copy-Paste: Multimodal and Depth-Aware Compositing for Robust Face Detection 提出Depth-Copy-Paste,通过多模态深度感知合成增强人脸检测鲁棒性。 Depth Anything
12 FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint FactorPortrait:通过解耦的表情、姿势和视角实现可控的人像动画 novel view synthesis
13 3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation 3DTeethSAM:利用SAM2进行三维牙齿分割,实现牙科数字化 localization
14 Reconstruction as a Bridge for Event-Based Visual Question Answering 提出基于重建的事件相机视觉问答框架,解决事件数据与多模态大语言模型兼容性问题。 scene understanding
15 DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation DOS:通过Zipfian原型蒸馏可观测软标签,实现自监督点云表示学习 point cloud
16 Collaborative Reconstruction and Repair for Multi-class Industrial Anomaly Detection 提出协同重建与修复网络CRR,解决多类别工业异常检测中的身份映射问题。 localization
17 Assisted Refinement Network Based on Channel Information Interaction for Camouflaged and Salient Object Detection 提出基于通道信息交互的辅助精炼网络,用于伪装目标检测和显著性目标检测。 localization
18 Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture 提出基于Transformer的交通视频事故检测模型,并构建了大规模平衡数据集。 optical flow
19 UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models 提出UFVideo,实现统一的多粒度视频协同理解,超越现有Video LLM。 localization
20 SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection SmokeBench:评估多模态大语言模型在野火烟雾检测中的性能 localization

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
21 TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition TSkel-Mamba:利用状态空间模型进行人体骨骼动作识别的时序动态建模 Mamba SSM state space model
22 VFMF: World Modeling by Forecasting Vision Foundation Model Features VFMF:通过预测视觉基础模型特征实现世界建模 world model flow matching
23 Flowception: Temporally Expansive Flow Matching for Video Generation Flowception:时序扩展的Flow Matching用于可变长度视频生成 flow matching
24 Physics-Informed Video Flare Synthesis and Removal Leveraging Motion Independence between Flare and Scene 提出一种基于物理信息的视频光晕合成与去除方法,解决光晕与场景运动独立性问题。 Mamba optical flow
25 BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models 提出BAgger,通过反向聚合缓解自回归视频扩散模型中的漂移问题 world model flow matching

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
26 FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model FutureX:基于潜在思维链世界模型的端到端自动驾驶增强方案 motion planning world model
27 Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus Semantic-Drive:通过开放词汇 grounding 和神经符号 VLM 共识实现长尾数据挖掘 walking
28 V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties V-RGBX:首个支持精确控制内参属性的视频编辑端到端框架 manipulation
29 Embodied Image Compression 提出具身图像压缩,解决具身智能体在低比特率下的实时任务执行问题。 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
30 Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation KineMIC:通过文本到动作蒸馏实现少样本动作合成,解决HAR数据稀缺问题。 text-to-motion
31 KeyframeFace: From Text to Expressive Facial Keyframes KeyframeFace:提出基于文本驱动的、可解释的关键帧人脸表情动画生成框架 motion synthesis

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
32 CARI4D: Category Agnostic 4D Reconstruction of Human-Object Interaction CARI4D:提出一种类别无关的4D人-物交互重建方法,解决单目RGB视频重建难题。 human-object interaction

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
33 CADMorph: Geometry-Driven Parametric CAD Editing via a Plan-Generate-Verify Loop CADMorph:提出几何驱动的参数化CAD编辑框架,解决设计迭代中几何形状调整与参数序列同步编辑问题。 structure preservation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页