cs.CV(2025-03-28)

📊 共 33 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (13 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (13 篇)

#题目一句话要点标签🔗
1 ABC-GS: Alignment-Based Controllable Style Transfer for 3D Gaussian Splatting ABC-GS:基于对齐的可控3D高斯溅射风格迁移 3D gaussian splatting gaussian splatting splatting
2 Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting 提出Segment then Splat,通过高斯溅射实现统一的3D开放词汇分割 gaussian splatting splatting open-vocabulary
3 EndoLRMGS: Complete Endoscopic Scene Reconstruction combining Large Reconstruction Modelling and Gaussian Splatting EndoLRMGS:结合大重建模型与高斯溅射的完整内窥镜场景重建 depth estimation gaussian splatting splatting
4 AH-GS: Augmented 3D Gaussian Splatting for High-Frequency Detail Representation AH-GS:增强3D高斯溅射高频细节表示,提升渲染保真度 3D gaussian splatting gaussian splatting splatting
5 TranSplat: Instant Cross-Scene Object Relighting in Gaussian Splatting via Spherical Harmonic Transfer TranSplat:基于球谐传递的高斯溅射即时跨场景物体光照重定向 3D gaussian splatting gaussian splatting splatting
6 One Look is Enough: Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation on High-Resolution Images 提出PRO框架,通过无缝分块细化实现高分辨率图像零样本单目深度估计 depth estimation monocular depth
7 NuGrounding: A Multi-View 3D Visual Grounding Framework in Autonomous Driving NuGrounding:面向自动驾驶的多视角3D视觉定位框架,解决指令粗粒度问题。 scene understanding visual grounding
8 Deep Depth Estimation from Thermal Image: Dataset, Benchmark, and Challenges 提出大规模多光谱立体数据集MS$^2$,并建立热成像深度估计基准。 depth estimation stereo depth
9 VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow VoteFlow通过可微投票模块,在自监督场景流中强制执行局部刚性约束。 scene flow
10 Follow Your Motion: A Generic Temporal Consistency Portrait Editing Framework with Trajectory Guidance 提出FYM框架,通过轨迹引导实现时间一致性的人像编辑 3D gaussian splatting gaussian splatting splatting
11 SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations SemAlign3D:通过对齐3D对象类别表示实现RGB图像间的语义对应 monocular depth
12 MVSAnywhere: Zero-Shot Multi-View Stereo MVSAnywhere:提出一种零样本多视角立体匹配方法,可泛化到不同场景和深度范围。 depth estimation
13 Segment Any Motion in Videos 提出一种结合长程轨迹运动线索和语义特征的视频运动目标分割方法 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
14 A Survey on Remote Sensing Foundation Models: From Vision to Multimodality 遥感领域大模型综述:从视觉到多模态的进展与挑战 foundation model multimodal
15 AutoComPose: Automatic Generation of Pose Transition Descriptions for Composed Pose Retrieval Using Multimodal LLMs AutoComPose:利用多模态LLM自动生成姿态迁移描述,用于组合姿态检索。 large language model multimodal
16 RUNA: Object-level Out-of-Distribution Detection via Regional Uncertainty Alignment of Multimodal Representations RUNA:通过多模态表征的区域不确定性对齐实现目标级分布外检测 multimodal
17 Learning to Instruct for Visual Instruction Tuning 提出L2T以提升视觉指令调优效果,解决过拟合和捷径学习问题 multimodal instruction following
18 DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos DeepSound-V1:利用多模态LLM的思维链,提升视频生成音频的同步性和质量。 large language model chain-of-thought
19 Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization 提出多语言文本正则化方法,解决视觉语言模型中的图像诱导保真度损失问题 multimodal
20 SCHNet: SAM Marries CLIP for Human Parsing SCHNet:融合SAM与CLIP用于提升人体解析性能 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
21 Intrinsic Image Decomposition for Robust Self-supervised Monocular Depth Estimation on Reflective Surfaces 提出基于本征图像分解的自监督单目深度估计方法,提升反射表面深度预测精度 distillation depth estimation monocular depth
22 EchoFlow: A Foundation Model for Cardiac Ultrasound Image and Video Generation EchoFlow:用于生成高质量心脏超声图像和视频的隐私保护基础模型 flow matching foundation model
23 Q-Insight: Understanding Image Quality via Visual Reinforcement Learning Q-Insight:基于视觉强化学习的图像质量理解模型 reinforcement learning large language model
24 Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization 提出基于CoP指导和组合偏好优化的Flow Matching V2A模型,提升音频生成质量 flow matching chain-of-thought
25 Dataset Distillation of 3D Point Clouds via Distribution Matching 提出基于分布匹配的3D点云数据集蒸馏方法,提升小规模数据集训练性能。 distillation
26 DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness 提出DSO框架,利用模拟反馈对齐3D生成器,提升生成对象的物理合理性。 DPO direct preference optimization

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
27 Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations 提出基于动作单元引导的视频表征方法,用于检测局部深度伪造篡改。 manipulation spatiotemporal
28 Scalable heliostat surface predictions from focal spots: Sim-to-Real transfer of inverse Deep Learning Raytracing 提出基于逆深度学习光线追踪的Sim-to-Real方法,实现可扩展的定日镜表面预测。 sim-to-real MAE
29 Multi-modal Knowledge Distillation-based Human Trajectory Forecasting 提出基于多模态知识蒸馏的人类轨迹预测框架,提升资源受限场景下的预测精度。 locomotion distillation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
30 GAITGen: Disentangled Motion-Pathology Impaired Gait Generative Model -- Bringing Motion Generation to the Clinical Domain GAITGen:解耦运动-病理步态生成模型,推动运动生成进入临床领域 motion generation
31 SIGHT: Synthesizing Image-Text Conditioned and Geometry-Guided 3D Hand-Object Trajectories SIGHT:提出图像-文本条件和几何引导的3D手-物交互轨迹生成方法 physically plausible embodied AI

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
32 EgoToM: Benchmarking Theory of Mind Reasoning from Egocentric Videos EgoToM:提出基于第一视角视频的心智理论推理评测基准。 egocentric Ego4D large language model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
33 SocialGen: Modeling Multi-Human Social Interaction with Language Models SocialGen:提出一种基于语言模型的多人社交互动建模方法。 two-person interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页