cs.CV(2025-12-31)

📊 共 14 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (5) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
1 VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents VLN-MME:诊断多模态大语言模型在语言引导视觉导航任务中的表现 VLN large language model multimodal
2 FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation FinMMDocR:提出金融多模态推理基准,关注场景感知、文档理解和多步计算。 large language model multimodal
3 MoniRefer: A Real-world Large-scale Multi-modal Dataset based on Roadside Infrastructure for 3D Visual Grounding 提出MoniRefer数据集,用于路侧基础设施的3D视觉定位任务 visual grounding
4 RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios 提出RGBT-Ground基准,用于评估复杂场景下RGB-T图像的视觉定位 visual grounding
5 EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation 提出EchoFoley,通过事件中心的分层控制实现视频相关的创意声音生成。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
6 Splatwizard: A Benchmark Toolkit for 3D Gaussian Splatting Compression Splatwizard:用于3D高斯溅射压缩的综合基准测试工具包 3D gaussian splatting 3DGS gaussian splatting
7 FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM FoundationSLAM:利用深度基础模型实现端到端稠密视觉SLAM visual SLAM geometric consistency foundation model
8 Projection-based Adversarial Attack using Physics-in-the-Loop Optimization for Monocular Depth Estimation 提出基于物理环路优化的投影对抗攻击,用于单目深度估计 depth estimation monocular depth
9 HaineiFRDM: Explore Diffusion to Restore Defects in Fast-Movement Films 提出HaineiFRDM,利用扩散模型修复快速移动影片中的缺陷 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
10 UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning UniC-Lift:通过对比学习实现统一的3D实例分割 contrastive learning 3D gaussian splatting 3DGS
11 PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation 提出PhyGDPO框架,通过物理感知的群体偏好优化实现物理一致的文本生成视频。 direct preference optimization chain-of-thought

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
12 ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands ShowUI-$π$:提出基于Flow的生成模型,实现GUI界面的灵巧操作。 manipulation dexterous hand dexterous manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
13 Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression 提出一种分层矢量量化隐变量的感知低分辨率视频压缩方法,适用于带宽受限场景。 VQ-VAE spatiotemporal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
14 GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction GaMO:基于几何感知的多视角扩散外绘用于稀疏视角3D重建 geometric consistency

⬅️ 返回 cs.CV 首页 · 🏠 返回主页