cs.CV（2025-03-28）

📊 共 33 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (13 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗2) 支柱一：机器人控制 (Robot Control) (3 🔗1) 支柱四：生成式动作 (Generative Motion) (2) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting	ABC-GS：基于对齐的可控3D高斯溅射风格迁移	3D gaussian splatting gaussian splatting splatting	✅
2	Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting	提出Segment then Splat，通过高斯溅射实现统一的3D开放词汇分割	gaussian splatting splatting open-vocabulary
3	EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting	EndoLRMGS：结合大重建模型与高斯溅射的完整内窥镜场景重建	depth estimation gaussian splatting splatting
4	AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation	AH-GS：增强3D高斯溅射高频细节表示，提升渲染保真度	3D gaussian splatting gaussian splatting splatting
5	TranSplat: Instant Cross-Scene Object Relighting in Gaussian Splatting via Spherical Harmonic Transfer	TranSplat：基于球谐传递的高斯溅射即时跨场景物体光照重定向	3D gaussian splatting gaussian splatting splatting
6	One Look is Enough: Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation on High-Resolution Images	提出PRO框架，通过无缝分块细化实现高分辨率图像零样本单目深度估计	depth estimation monocular depth
7	NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving	NuGrounding：面向自动驾驶的多视角3D视觉定位框架，解决指令粗粒度问题。	scene understanding visual grounding
8	Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges	提出大规模多光谱立体数据集MS$^2$，并建立热成像深度估计基准。	depth estimation stereo depth	✅
9	VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow	VoteFlow通过可微投票模块，在自监督场景流中强制执行局部刚性约束。	scene flow	✅
10	Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance	提出FYM框架，通过轨迹引导实现时间一致性的人像编辑	3D gaussian splatting gaussian splatting splatting
11	SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations	SemAlign3D：通过对齐3D对象类别表示实现RGB图像间的语义对应	monocular depth
12	MVSAnywhere: Zero-Shot Multi-View Stereo	MVSAnywhere：提出一种零样本多视角立体匹配方法，可泛化到不同场景和深度范围。	depth estimation
13	Segment Any Motion in Videos	提出一种结合长程轨迹运动线索和语义特征的视频运动目标分割方法	optical flow	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
14	A Survey on Remote Sensing Foundation Models: From Vision to Multimodality	遥感领域大模型综述：从视觉到多模态的进展与挑战	foundation model multimodal	✅
15	AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs	AutoComPose：利用多模态LLM自动生成姿态迁移描述，用于组合姿态检索。	large language model multimodal
16	RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations	RUNA：通过多模态表征的区域不确定性对齐实现目标级分布外检测	multimodal
17	Learning to Instruct for Visual Instruction Tuning	提出L2T以提升视觉指令调优效果，解决过拟合和捷径学习问题	multimodal instruction following	✅
18	DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos	DeepSound-V1：利用多模态LLM的思维链，提升视频生成音频的同步性和质量。	large language model chain-of-thought
19	Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization	提出多语言文本正则化方法，解决视觉语言模型中的图像诱导保真度损失问题	multimodal
20	SCHNet: SAM Marries CLIP for Human Parsing	SCHNet：融合SAM与CLIP用于提升人体解析性能	foundation model

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces	提出基于本征图像分解的自监督单目深度估计方法，提升反射表面深度预测精度	distillation depth estimation monocular depth
22	EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation	EchoFlow：用于生成高质量心脏超声图像和视频的隐私保护基础模型	flow matching foundation model	✅
23	Q-Insight: Understanding Image Quality via Visual Reinforcement Learning	Q-Insight：基于视觉强化学习的图像质量理解模型	reinforcement learning large language model	✅
24	Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization	提出基于CoP指导和组合偏好优化的Flow Matching V2A模型，提升音频生成质量	flow matching chain-of-thought
25	Dataset Distillation of 3D Point Clouds via Distribution Matching	提出基于分布匹配的3D点云数据集蒸馏方法，提升小规模数据集训练性能。	distillation
26	DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness	提出DSO框架，利用模拟反馈对齐3D生成器，提升生成对象的物理合理性。	DPO direct preference optimization

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
27	Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations	提出基于动作单元引导的视频表征方法，用于检测局部深度伪造篡改。	manipulation spatiotemporal
28	Scalable heliostat surface predictions from focal spots: Sim-to-Real transfer of inverse Deep Learning Raytracing	提出基于逆深度学习光线追踪的Sim-to-Real方法，实现可扩展的定日镜表面预测。	sim-to-real MAE
29	Multi-modal Knowledge Distillation-based Human Trajectory Forecasting	提出基于多模态知识蒸馏的人类轨迹预测框架，提升资源受限场景下的预测精度。	locomotion distillation	✅

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
30	GAITGen: Disentangled Motion-Pathology Impaired Gait Generative Model -- Bringing Motion Generation to the Clinical Domain	GAITGen：解耦运动-病理步态生成模型，推动运动生成进入临床领域	motion generation
31	SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories	SIGHT：提出图像-文本条件和几何引导的3D手-物交互轨迹生成方法	physically plausible embodied AI

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
32	EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos	EgoToM：提出基于第一视角视频的心智理论推理评测基准。	egocentric Ego4D large language model

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
33	SocialGen: Modeling Multi-Human Social Interaction with Language Models	SocialGen：提出一种基于语言模型的多人社交互动建模方法。	two-person interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页