cs.CV(2025-08-20)

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 Reconstruction Using the Invisible: Intuition from NIR and Metadata for Enhanced 3D Gaussian Splatting 提出NIRSplat以解决农业场景下3D重建问题 3D gaussian splatting 3DGS gaussian splatting
2 GeMS: Efficient Gaussian Splatting for Extreme Motion Blur 提出GeMS框架以解决极端运动模糊问题 3D gaussian splatting 3DGS gaussian splatting
3 GSFix3D: Diffusion-Guided Repair of Novel Views in Gaussian Splatting 提出GSFix3D以解决极端视角下的3D重建问题 3D gaussian splatting gaussian splatting splatting
4 GOGS: High-Fidelity Geometry and Relighting for Glossy Objects via Gaussian Surfels 提出GOGS以解决光滑物体逆向渲染中的模糊性问题 3D gaussian splatting gaussian splatting splatting
5 Multiscale Video Transformers for Class Agnostic Segmentation in Autonomous Driving 提出多尺度视频变换器以解决自动驾驶中的类无关分割问题 optical flow spatiotemporal large language model
6 Reliable Smoke Detection via Optical Flow-Guided Feature Fusion and Transformer-Based Uncertainty Modeling 提出光流引导特征融合与变换器不确定性建模以实现可靠烟雾检测 optical flow spatiotemporal
7 6-DoF Object Tracking with Event-based Optical Flow and Frames 提出基于事件相机的光流与RGB融合方法以解决高速物体6自由度跟踪问题 optical flow
8 Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels 提出PIXIE以解决3D场景物理属性推断问题 gaussian splatting splatting
9 FOCUS: Frequency-Optimized Conditioning of DiffUSion Models for mitigating catastrophic forgetting during Test-Time Adaptation 提出FOCUS以解决测试时适应中的灾难性遗忘问题 depth estimation monocular depth

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
10 PB-IAD: Utilizing multimodal foundation models for semantic industrial anomaly detection in dynamic manufacturing environments 提出PB-IAD框架以解决动态制造环境中的异常检测问题 foundation model multimodal
11 Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models 提出知识继承方法以提升视觉基础模型的性能 foundation model
12 MSNav: Zero-Shot Vision-and-Language Navigation with Dynamic Memory and LLM Spatial Reasoning 提出MSNav框架以解决视觉语言导航中的空间推理与记忆问题 VLN large language model
13 AnchorSync: Global Consistency Optimization for Long Video Editing 提出AnchorSync以解决长视频编辑中的一致性问题 multimodal
14 Locality-aware Concept Bottleneck Model 提出局部感知概念瓶颈模型以解决概念定位问题 foundation model
15 Taming Transformer for Emotion-Controllable Talking Face Generation 提出情感可控的说话人脸生成方法以解决多模态关系建模问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
16 UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling 提出UST-SSM以解决点云视频建模中的时空无序问题 SSM state space model
17 Vivid-VR: Distilling Concepts from Text-to-Video Diffusion Transformer for Photorealistic Video Restoration 提出Vivid-VR以解决视频恢复中的纹理真实感与时间一致性问题 distillation foundation model multimodal
18 MS-CLR: Multi-Skeleton Contrastive Learning for Human Action Recognition 提出多骨架对比学习方法以解决动作识别中的骨架结构多样性问题 contrastive learning
19 Adversarial Generation and Collaborative Evolution of Safety-Critical Scenarios for Autonomous Vehicles 提出ScenGE框架以生成安全关键场景,提升自动驾驶安全性 reinforcement learning large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
20 LookOut: Real-World Humanoid Egocentric Navigation 提出LookOut以解决人形机器人自我中心导航问题 humanoid humanoid robot egocentric
21 Fusing Monocular RGB Images with AIS Data to Create a 6D Pose Estimation Dataset for Marine Vessels 通过融合单目RGB图像与AIS数据解决海洋船舶的6D姿态估计问题 manipulation 6D pose estimation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
22 Making Pose Representations More Expressive and Disentangled via Residual Vector Quantization 提出残差向量量化以增强姿态表示的表达能力与解耦性 text-to-motion motion generation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
23 GaussianArt: Unified Modeling of Geometry and Motion for Articulated Objects 提出GaussianArt以解决关节物体重建中的几何与运动建模问题 human-scene interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页