cs.CV(2025-11-17)

📊 共 44 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱三:空间感知 (Perception & SLAM) (25 🔗6) 支柱二:RL算法与架构 (RL & Architecture) (14 🔗4) 支柱一:机器人控制 (Robot Control) (4) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱三:空间感知 (Perception & SLAM) (25 篇)

#题目一句话要点标签🔗
1 Beyond Darkness: Thermal-Supervised 3D Gaussian Splatting for Low-Light Novel View Synthesis 提出DTGS:一种热监督3D高斯溅射方法,用于低光照下的新视角合成。 3D gaussian splatting 3DGS gaussian splatting
2 Opt3DGS: Optimizing 3D Gaussian Splatting with Adaptive Exploration and Curvature-Aware Exploitation Opt3DGS:通过自适应探索和曲率感知利用优化3D高斯溅射 3D gaussian splatting 3DGS gaussian splatting
3 SF-Recon: Simplification-Free Lightweight Building Reconstruction via 3D Gaussian Splatting SF-Recon:通过3D高斯溅射实现免简化的轻量级建筑重建 3D gaussian splatting 3DGS gaussian splatting
4 SymGS : Leveraging Local Symmetries for 3D Gaussian Splatting Compression SymGS:利用局部对称性压缩3D高斯溅射模型 3D gaussian splatting 3DGS gaussian splatting
5 GeoX-Bench: Benchmarking Cross-View Geo-Localization and Pose Estimation Capabilities of Large Multimodal Models GeoX-Bench:用于评估大模型跨视角地理定位与姿态估计能力的基准测试。 pose estimation localization navigation
6 Neo: Real-Time On-Device 3D Gaussian Splatting with Reuse-and-Update Sorting Acceleration Neo:基于重用-更新排序加速的实时端侧3D高斯溅射 3D gaussian splatting 3DGS gaussian splatting
7 GRLoc: Geometric Representation Regression for Visual Localization 提出GRLoc:通过几何表示回归实现更鲁棒的视觉定位 novel view synthesis pose estimation localization
8 PFAvatar: Pose-Fusion 3D Personalized Avatar Reconstruction from Real-World Outfit-of-the-Day Photos PFAvatar:从日常照片中进行姿态融合的个性化3D头像重建 NeRF neural radiance pose estimation
9 RSPose: Ranking Based Losses for Human Pose Estimation RSPose:提出基于排序损失的人体姿态估计方法,显著提升mAP pose estimation localization
10 Is your VLM Sky-Ready? A Comprehensive Spatial Intelligence Benchmark for UAV Navigation 提出SpatialSky-Bench以评估无人机导航中的空间智能能力 scene understanding navigation
11 CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model 提出CloseUpShot,通过点云条件扩散模型实现稀疏视角下的近距离新视角合成 novel view synthesis point cloud
12 Reconstructing 3D Scenes in Native High Dynamic Range 提出NH-3DGS,直接从原生HDR数据重建高质量3D场景 3D gaussian splatting 3DGS gaussian splatting
13 Towards Metric-Aware Multi-Person Mesh Recovery by Jointly Optimizing Human Crowd in Camera Space 提出深度条件平移优化与度量感知网络,实现相机空间多人网格重建 monocular depth human mesh recovery HMR
14 End-to-End Multi-Person Pose Estimation with Pose-Aware Video Transformer 提出PAVE-Net,一种端到端姿态感知视频Transformer网络,用于多人视频姿态估计。 pose estimation
15 CapeNext: Rethinking and Refining Dynamic Support Information for Category-Agnostic Pose Estimation CapeNext:通过优化动态支持信息,改进类别无关的姿态估计 pose estimation
16 MGCA-Net: Multi-Grained Category-Aware Network for Open-Vocabulary Temporal Action Localization 提出MGCA-Net,通过多粒度类别感知解决开放词汇时序动作定位问题。 localization
17 Inertia-Informed Orientation Priors for Event-Based Optical Flow Estimation 提出一种融合惯性信息的事件相机光流估计方法,提升鲁棒性和收敛性。 optical flow
18 CoordAR: One-Reference 6D Pose Estimation of Novel Objects via Autoregressive Coordinate Map Generation CoordAR:基于自回归坐标图生成的单参考新物体6D位姿估计 pose estimation
19 Training-Free Multi-View Extension of IC-Light for Textual Position-Aware Scene Relighting GS-Light:基于高斯溅射的文本引导、无训练多视角场景重光照方法 3DGS gaussian splatting
20 Part-X-MLLM: Part-aware 3D Multimodal Large Language Model Part-X-MLLM:提出基于部件感知的3D多模态大语言模型,统一解决多种3D任务。 point cloud
21 Computer Vision based group activity detection and action spotting 提出基于计算机视觉的群体活动检测与行为定位框架,融合深度学习与图推理。 localization
22 Shedding Light on VLN Robustness: A Black-box Framework for Indoor Lighting-based Adversarial Attack 提出基于室内光照对抗攻击的VLN鲁棒性黑盒评估框架 navigation
23 A Lightweight 3D Anomaly Detection Method with Rotationally Invariant Features 提出基于旋转不变特征的轻量级3D异常检测方法,提升点云数据处理的鲁棒性。 point cloud
24 DiffPixelFormer: Differential Pixel-Aware Transformer for RGB-D Indoor Scene Segmentation 提出DiffPixelFormer,用于提升RGB-D室内场景分割的精度和效率。 navigation
25 ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes ArtiWorld:提出LLM驱动的3D场景物体可动性自动生成方法 point cloud

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
26 WinMamba: Multi-Scale Shifted Windows in State Space Model for 3D Object Detection WinMamba:面向3D目标检测,提出基于多尺度移位窗口的状态空间模型 Mamba state space model
27 Reconstruction-Driven Multimodal Representation Learning for Automated Media Understanding 提出基于重构驱动的多模态自编码器,用于自动化媒体内容理解。 representation learning
28 Distribution Matching Distillation Meets Reinforcement Learning 提出DMDR框架,结合强化学习与分布匹配蒸馏,提升少步扩散模型的生成质量。 reinforcement learning
29 Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks 提出一种高效微调策略,增强多模态对比学习模型抵抗后门攻击的鲁棒性 contrastive learning
30 Hybrid-Domain Adaptative Representation Learning for Gaze Estimation 提出混合领域自适应表示学习以解决注视估计中的跨域问题 representation learning
31 SOMA: Feature Gradient Enhanced Affine-Flow Matching for SAR-Optical Registration SOMA:通过特征梯度增强的仿射流匹配实现SAR-光学图像配准 flow matching
32 MCAQ-YOLO: Morphological Complexity-Aware Quantization for Efficient Object Detection with Curriculum Learning 提出MCAQ-YOLO,通过形态复杂度感知量化提升目标检测效率,适用于资源受限场景。 curriculum learning
33 DeepSport: A Multimodal Large Language Model for Comprehensive Sports Video Reasoning via Agentic Reinforcement Learning DeepSport:基于Agent强化学习的多模态大语言模型,用于全面的体育视频推理 reinforcement learning
34 FusionFM: All-in-One Multi-Modal Image Fusion with Flow Matching 提出FusionFM,利用Flow Matching实现高效多模态图像融合 flow matching
35 Start Small, Think Big: Curriculum-based Relative Policy Optimization for Visual Grounding 提出基于课程学习的相对策略优化CuRPO,提升视觉定位任务中CoT推理的性能。 reinforcement learning localization
36 FUSE: A Flow-based Mapping Between Shapes 提出基于Flow-Matching的形状映射方法,高效支持跨表示形状匹配。 flow matching point cloud
37 CASL: Curvature-Augmented Self-supervised Learning for 3D Anomaly Detection 提出CASL:一种曲率增强的自监督学习框架,用于提升3D异常检测性能。 representation learning point cloud
38 PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image PhysX-Anything:首个单图生成可用于仿真的物理3D资产框架 policy learning MuJoCo
39 Recurrent Autoregressive Diffusion: Global Memory Meets Local Attention 提出RAD框架,通过循环自回归扩散模型解决长视频生成中的记忆和时空一致性问题 world model Mamba

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
40 Free-Form Scene Editor: Enabling Multi-Round Object Manipulation like in a 3D Engine 提出FFSE,实现3D引擎般的多轮物体操作图像编辑 manipulation
41 RobustGait: Robustness Analysis for Appearance Based Gait Recognition RobustGait:针对基于外观的步态识别的鲁棒性分析框架 gait
42 Generative Photographic Control for Scene-Consistent Video Cinematic Editing CineCtrl:提出一种生成式视频电影编辑框架,实现对专业相机参数的精细控制。 manipulation
43 Uni-Hand: Universal Hand Motion Forecasting in Egocentric Views Uni-Hand:用于第一人称视角的通用手部运动预测框架 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
44 Uni-Inter: Unifying 3D Human Motion Synthesis Across Diverse Interaction Contexts 提出Uni-Inter框架以解决多种交互场景下的人类动作生成问题 motion synthesis motion generation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页