cs.CV(2025-11-13)

📊 共 27 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (18 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (18 篇)

#题目一句话要点标签🔗
1 Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision 提出基于物理散焦建模和多视角几何监督的深度一致性3D高斯溅射方法 depth estimation monocular depth 3D gaussian splatting
2 AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting 提出基于高斯溅射的人体动画框架,实现场景中逼真的人体自由视角渲染。 3D gaussian splatting 3DGS gaussian splatting
3 TSPE-GS: Probabilistic Depth Extraction for Semi-Transparent Surface Reconstruction via 3D Gaussian Splatting TSPE-GS:基于3D高斯溅射的半透明表面概率深度提取方法 3D gaussian splatting gaussian splatting
4 OmniVGGT: Omni-Modality Driven Visual Geometry Grounded Transformer OmniVGGT:多模态驱动的视觉几何对齐Transformer,提升3D视觉任务性能 depth estimation point cloud pose estimation
5 GFT: Graph Feature Tuning for Efficient Point Cloud Analysis 提出图特征调优(GFT)方法,高效分析点云数据并显著降低参数量。 point cloud
6 MSGNav: Unleashing the Power of Multi-modal 3D Scene Graph for Zero-Shot Embodied Navigation 提出多模态3D场景图MSGNav,用于零样本具身导航 navigation
7 RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion 提出RWKV-PCSSC,利用RWKV机制实现轻量高效的点云语义场景补全。 point cloud
8 IPCD: Intrinsic Point-Cloud Decomposition 提出IPCD,用于点云的本征分解,实现光照编辑和纹理修改等应用 point cloud
9 Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals 针对视障人士,评估轻量级VLM在视频理解中的可访问性,并提出定制化评估框架。 navigation social interaction
10 RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo 提出RobIA框架,用于深度立体匹配中鲁棒的、实例感知的持续测试时自适应 depth estimation stereo depth
11 LiNeXt: Revisiting LiDAR Completion with Efficient Non-Diffusion Architectures LiNeXt:提出高效非扩散架构,加速LiDAR点云补全并提升精度。 point cloud
12 Split-Layer: Enhancing Implicit Neural Representation by Maximizing the Dimensionality of Feature Space 提出Split-Layer以提升隐式神经表示的特征空间维度,增强表征能力 novel view synthesis
13 Toward bilipshiz geometric models 提出保持双利普希茨几何结构的3D点云神经网络模型 point cloud
14 LoG3D: Ultra-High-Resolution 3D Shape Modeling via Local-to-Global Partitioning LoG3D:通过局部到全局分割实现超高分辨率3D形状建模 point cloud
15 AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models AffordBot:利用多模态大语言模型实现细粒度3D具身推理 point cloud
16 DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Semantic Instance Segmentation 提出DBGroup:双分支点云分组网络,用于弱监督3D语义实例分割 scene understanding
17 MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 提出MosaicDoc:一个大规模双语视觉文档理解基准,解决现有基准的局限性。 localization
18 HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models 提出HCC-3D,通过分层补偿压缩实现3D视觉语言模型中98%的Token缩减 point cloud

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
19 Depth Anything 3: Recovering the Visual Space from Any Views Depth Anything 3:从任意视角恢复空间几何信息,无需架构特化。 teacher-student depth estimation monocular depth
20 Multitask GLocal OBIA-Mamba for Sentinel-2 Landcover Mapping 提出多任务GLocal OBIA-Mamba模型,提升Sentinel-2土地覆盖分类精度。 Mamba
21 PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning 提出PROPA框架,通过强化学习优化视觉推理中的过程级依赖问题 reinforcement learning
22 Do Blind Spots Matter for Word-Referent Mapping? A Computational Study with Infant Egocentric Video 提出基于盲点感知的自监督视觉表征学习方法,用于提升婴儿视角视频中的词-物映射 masked autoencoder contrastive learning

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
23 Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization 提出基于下一帧特征预测的多模态Deepfake检测与时序定位方法 manipulation localization
24 SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation SemanticVLA:面向高效机器人操作的语义对齐稀疏化与增强 manipulation

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
25 Dynamic Avatar-Scene Rendering from Human-centric Context 提出Separate-then-Map策略,解决单目视频中动态人与场景交互的神经渲染问题 human-scene interaction
26 VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction VISTA:一种用于多智能体轨迹预测的视觉和意图感知社交注意力框架 social interaction

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
27 Mitigating Error Accumulation in Co-Speech Motion Generation via Global Rotation Diffusion and Multi-Level Constraints 提出GlobalDiff,通过全局旋转扩散和多级约束缓解共语运动生成中的误差累积 motion generation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页