cs.CV（2025-12-12）

📊 共 33 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知 (Perception & SLAM) (20 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (5) 支柱一：机器人控制 (Robot Control) (4) 支柱四：生成式动作 (Generative Motion) (2 🔗2) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱三：空间感知 (Perception & SLAM) (20 篇)

#	题目	一句话要点	标签	🔗
1	Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance	提出基于矩的3D高斯溅射，通过与顺序无关的透射率解决体积遮挡问题	3D gaussian splatting 3DGS gaussian splatting
2	Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video	提出先验增强的高斯溅射方法，用于从日常视频中重建动态场景	gaussian splatting scene reconstruction
3	Lightweight 3D Gaussian Splatting Compression via Video Codec	提出基于视频编解码器的轻量级3D高斯溅射压缩方法，适用于轻量级设备。	3D gaussian splatting gaussian splatting	✅
4	MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction	提出MultiEgo：用于4D场景重建的多视角第一人称视频数据集	scene reconstruction social interaction
5	Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using LiDAR HD Reference Data across Metropolitan France	提出THREASURE-Net，利用Sentinel-2时间序列和LiDAR数据进行高分辨率森林冠层高度制图。	height map	✅
6	On Geometric Understanding and Learned Data Priors in VGGT	分析VGGT几何理解能力：揭示其隐式几何学习与数据先验依赖	VGGT
7	Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization	提出扩展时序位移模块的多任务学习方法，用于时序动作定位	localization
8	Exploring Spatial-Temporal Representation via Star Graph for mmWave Radar-based Human Activity Recognition	提出基于星型图的离散动态图神经网络，用于毫米波雷达人体活动识别	point cloud
9	Particulate: Feed-Forward 3D Object Articulation	Particulate：提出一种前馈3D物体关节运动估计方法，无需逐对象优化。	point cloud
10	Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation	提出SAM2VideoX，通过蒸馏结构保持运动先验，提升视频生成质量。	optical flow
11	Depth-Copy-Paste: Multimodal and Depth-Aware Compositing for Robust Face Detection	提出Depth-Copy-Paste，通过多模态深度感知合成增强人脸检测鲁棒性。	Depth Anything
12	FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint	FactorPortrait：通过解耦的表情、姿势和视角实现可控的人像动画	novel view synthesis
13	3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation	3DTeethSAM：利用SAM2进行三维牙齿分割，实现牙科数字化	localization
14	Reconstruction as a Bridge for Event-Based Visual Question Answering	提出基于重建的事件相机视觉问答框架，解决事件数据与多模态大语言模型兼容性问题。	scene understanding
15	DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation	DOS：通过Zipfian原型蒸馏可观测软标签，实现自监督点云表示学习	point cloud
16	Collaborative Reconstruction and Repair for Multi-class Industrial Anomaly Detection	提出协同重建与修复网络CRR，解决多类别工业异常检测中的身份映射问题。	localization
17	Assisted Refinement Network Based on Channel Information Interaction for Camouflaged and Salient Object Detection	提出基于通道信息交互的辅助精炼网络，用于伪装目标检测和显著性目标检测。	localization	✅
18	Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture	提出基于Transformer的交通视频事故检测模型，并构建了大规模平衡数据集。	optical flow
19	UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models	提出UFVideo，实现统一的多粒度视频协同理解，超越现有Video LLM。	localization
20	SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection	SmokeBench：评估多模态大语言模型在野火烟雾检测中的性能	localization

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签
21	TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition	TSkel-Mamba：利用状态空间模型进行人体骨骼动作识别的时序动态建模	Mamba SSM state space model
22	VFMF: World Modeling by Forecasting Vision Foundation Model Features	VFMF：通过预测视觉基础模型特征实现世界建模	world model flow matching
23	Flowception: Temporally Expansive Flow Matching for Video Generation	Flowception：时序扩展的Flow Matching用于可变长度视频生成	flow matching
24	Physics-Informed Video Flare Synthesis and Removal Leveraging Motion Independence between Flare and Scene	提出一种基于物理信息的视频光晕合成与去除方法，解决光晕与场景运动独立性问题。	Mamba optical flow
25	BAgger: Backwards Aggregation for Mitigating Drift in Autoregressive Video Diffusion Models	提出BAgger，通过反向聚合缓解自回归视频扩散模型中的漂移问题	world model flow matching

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

#	题目	一句话要点	标签
26	FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model	FutureX：基于潜在思维链世界模型的端到端自动驾驶增强方案	motion planning world model
27	Semantic-Drive: Democratizing Long-Tail Data Curation via Open-Vocabulary Grounding and Neuro-Symbolic VLM Consensus	Semantic-Drive：通过开放词汇 grounding 和神经符号 VLM 共识实现长尾数据挖掘	walking
28	V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties	V-RGBX：首个支持精确控制内参属性的视频编辑端到端框架	manipulation
29	Embodied Image Compression	提出具身图像压缩，解决具身智能体在低比特率下的实时任务执行问题。	manipulation

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
30	Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation	KineMIC：通过文本到动作蒸馏实现少样本动作合成，解决HAR数据稀缺问题。	text-to-motion	✅
31	KeyframeFace: From Text to Expressive Facial Keyframes	KeyframeFace：提出基于文本驱动的、可解释的关键帧人脸表情动画生成框架	motion synthesis	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
32	CARI4D: Category Agnostic 4D Reconstruction of Human-Object Interaction	CARI4D：提出一种类别无关的4D人-物交互重建方法，解决单目RGB视频重建难题。	human-object interaction

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
33	CADMorph: Geometry-Driven Parametric CAD Editing via a Plan-Generate-Verify Loop	CADMorph：提出几何驱动的参数化CAD编辑框架，解决设计迭代中几何形状调整与参数序列同步编辑问题。	structure preservation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页

cs.CV（2025-12-12）

🎯 兴趣领域导航

🔬 支柱三：空间感知 (Perception & SLAM) (20 篇)

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

⭐ 我的收藏

📁 新建收藏夹

⚙️ 管理收藏夹

🔍 搜索论文

🔐 登录 / 注册