cs.CV（2025-12-27）

📊 共 18 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone	提出基于扩散语言模型的Dream-VL和Dream-VLA，用于视觉语言理解和机器人控制。	vision-language-action VLA large language model
2	Self-Rewarded Multimodal Coherent Reasoning Across Diverse Visual Domains	提出SR-MCR框架，通过自奖励机制提升多模态LLM在视觉领域的推理连贯性和准确性。	multimodal visual grounding
3	Multimodal Diffeomorphic Registration with Neural ODEs and Structural Descriptors	提出基于神经ODE和结构描述符的多模态微分同胚配准方法	multimodal
4	SCAFusion: A Multimodal 3D Detection Framework for Small Object Detection in Lunar Surface Exploration	SCAFusion：用于月球表面小目标检测的多模态3D检测框架	multimodal
5	CritiFusion: Semantic Critique and Spectral Alignment for Faithful Text-to-Image Generation	CritiFusion：通过语义批判和频谱对齐实现高质量文本到图像生成	large language model multimodal
6	Rethinking Memory Design in SAM-Based Visual Object Tracking	提出SAM跟踪统一混合记忆框架，提升长时遮挡和复杂场景下的鲁棒性	foundation model	✅
7	DreamOmni3: Scribble-based Editing and Generation	DreamOmni3：提出基于草图的图像编辑与生成框架，解决文本提示不足问题。	multimodal
8	Unified Review and Benchmark of Deep Segmentation Architectures for Cardiac Ultrasound on CAMUS	针对心脏超声图像分割，统一评估和基准测试深度学习架构	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Autoregressive Flow Matching for Motion Prediction	提出自回归Flow Matching模型ARFM，用于长时程运动轨迹预测。	flow matching human motion human motion prediction	✅
10	FinPercep-RM: A Fine-grained Reward Model and Co-evolutionary Curriculum for RL-based Real-world Super-Resolution	提出FinPercep-RM和CCL，提升RL在真实超分辨率中的感知质量并抑制reward hacking。	reinforcement learning policy learning RLHF
11	MEGA-PCC: A Mamba-based Efficient Approach for Joint Geometry and Attribute Point Cloud Compression	MEGA-PCC：基于Mamba的高效点云几何与属性联合压缩方法	Mamba
12	Tracking by Predicting 3-D Gaussians Over Time	提出Video-GMAE，通过预测3D高斯演化实现视频表征学习与目标跟踪。	representation learning masked autoencoder	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
13	SAM 3D for 3D Object Reconstruction from Remote Sensing Images	提出SAM 3D，用于遥感图像三维建筑物重建，提升屋顶几何一致性和边界清晰度。	scene reconstruction sam 3D SAM 3D
14	Visual Autoregressive Modelling for Monocular Depth Estimation	提出基于视觉自回归先验的单目深度估计方法，提升室内外场景深度预测精度。	depth estimation monocular depth classifier-free guidance	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Pose-Guided Residual Refinement for Interpretable Text-to-Motion Generation and Editing	提出姿态引导残差精炼方法，提升文本到动作生成与编辑的可解释性和保真度	text-to-motion motion generation

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Envision: Embodied Visual Planning via Goal-Imagery Video Diffusion	Envision：基于目标图像视频扩散的具身视觉规划框架，解决空间漂移和目标不一致问题。	manipulation physically plausible

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	SuperiorGAT: Graph Attention Networks for Sparse LiDAR Point Cloud Reconstruction in Autonomous Systems	SuperiorGAT：用于自动驾驶系统中稀疏LiDAR点云重建的图注意力网络	geometric consistency

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Event-based high temporal resolution measurement of shock wave motion field	提出一种基于事件相机的高时空分辨率冲击波运动场测量框架	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页