cs.CV(2025-09-14)

📊 共 22 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 ROSGS: Relightable Outdoor Scenes With Gaussian Splatting ROSGS:利用高斯溅射实现可重光照的室外场景重建 3D gaussian splatting 3DGS gaussian splatting
2 SVR-GS: Spatially Variant Regularization for Probabilistic Masks in 3D Gaussian Splatting SVR-GS:基于空间变异正则化的3D高斯溅射概率掩码优化 3D gaussian splatting 3DGS gaussian splatting
3 Multispectral-NeRF:a multispectral modeling approach based on neural radiance fields 提出Multispectral-NeRF,用于多光谱数据的NeRF三维重建 NeRF neural radiance field
4 On the Skinning of Gaussian Avatars 提出基于加权旋转混合的高斯Avatar蒙皮方法,提升动画真实性 gaussian splatting splatting neural radiance field
5 In-Vivo Skin 3-D Surface Reconstruction and Wrinkle Depth Estimation using Handheld High Resolution Tactile Sensing 提出一种基于手持式高分辨率触觉传感的皮肤三维重建与皱纹深度估计方法 depth estimation
6 The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge CPS团队提出基于LLaVA微调与深度信息融合的视觉语言模型,用于CVPR 2024自动驾驶挑战赛 depth estimation chain-of-thought
7 Modality-Aware Infrared and Visible Image Fusion with Target-Aware Supervision 提出FusionNet,通过模态感知和目标感知监督实现红外与可见光图像融合 scene understanding
8 UnLoc: Leveraging Depth Uncertainties for Floorplan Localization UnLoc:利用深度不确定性进行室内平面图定位 monocular depth
9 No Mesh, No Problem: Estimating Coral Volume and Surface from Sparse Multi-View Images 提出轻量级框架以从稀疏多视图图像估计珊瑚体积与表面 VGGT

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
10 Contextualized Multimodal Lifelong Person Re-Identification in Hybrid Clothing States 提出CMLReID框架,解决混合服装状态下的终身行人重识别问题 multimodal
11 Rate-Distortion Limits for Multimodal Retrieval: Theory, Optimal Codes, and Finite-Sample Guarantees 为多模态检索建立信息论极限,提出最优编码方案并提供有限样本保证。 multimodal
12 Pathological Truth Bias in Vision-Language Models 提出MATS评估视觉语言模型在视觉矛盾下的真值偏差,揭示并定位模型失效点。 multimodal
13 Leveraging Geometric Priors for Unaligned Scene Change Detection 提出基于几何先验的无对齐场景变化检测方法,提升视角变化下的鲁棒性。 foundation model
14 MIS-LSTM: Multichannel Image-Sequence LSTM for Sleep Quality and Stress Prediction 提出MIS-LSTM模型,融合CNN与LSTM,用于睡眠质量和压力预测。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
15 Mars Traversability Prediction: A Multi-modal Self-supervised Approach for Costmap Generation 提出一种多模态自监督方法,用于火星车地形 traversability costmap 生成。 MAE traversability
16 End-to-End Visual Autonomous Parking via Control-Aided Attention 提出基于控制信号引导的注意力机制CAA-Policy,实现端到端视觉自主泊车。 imitation learning motion prediction
17 MultiMAE for Brain MRIs: Robustness to Missing Inputs Using Multi-Modal Masked Autoencoder MultiMAE用于脑部MRI:利用多模态掩码自编码器增强缺失输入的鲁棒性 masked autoencoder MAE
18 MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation MixANT:基于观察依赖的记忆传播,用于随机密集动作预测 Mamba SSM state space model

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
19 Motion Estimation for Multi-Object Tracking using KalmanNet with Semantic-Independent Encoding 提出语义独立KalmanNet(SIKNet),提升多目标跟踪中的运动估计精度与鲁棒性。 motion estimation
20 Beyond Frame-wise Tracking: A Trajectory-based Paradigm for Efficient Point Cloud Tracking 提出基于轨迹的TrajTrack,提升LiDAR点云单目标跟踪效率与精度。 motion estimation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
21 Traffic-MLLM: A Spatio-Temporal MLLM with Retrieval-Augmented Generation for Causal Inference in Traffic Traffic-MLLM:融合检索增强生成的时空多模态大语言模型,用于交通因果推理 spatiotemporal large language model multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
22 Beyond Sliders: Mastering the Art of Diffusion-based Image Manipulation 提出Beyond Sliders,融合GAN与扩散模型,提升真实图像编辑质量 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页