cs.CV(2024-09-12)

📊 共 28 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (10 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (8 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
1 Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy 针对内窥镜图像,提出改进的Depth Anything模型用于无监督单目深度估计 depth estimation monocular depth Depth Anything
2 FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally 提出FlashSplat,通过线性规划最优求解2D到3D高斯溅射分割问题 3D gaussian splatting gaussian splatting splatting
3 SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length SwinGS:提出基于滑动窗口高斯溅射的任意长度体视频实时流式传输框架 3D gaussian splatting 3DGS gaussian splatting
4 Open-Vocabulary Remote Sensing Image Semantic Segmentation 提出面向遥感图像的开放词汇语义分割框架,解决方向和尺度变化难题。 semantic map open-vocabulary open vocabulary
5 LED: Light Enhanced Depth Estimation at Night LED:利用车头灯光增强夜间深度估计,提升自动驾驶安全性 depth estimation Depth Anything scene understanding
6 Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis Thermal3D-GS:利用物理先验的三维高斯模型,用于热红外新视角合成 3D gaussian splatting gaussian splatting splatting
7 Expansive Supervision for Neural Radiance Field 提出Expansive Supervision,通过部分光线选择监督加速NeRF训练,降低时间和内存消耗。 NeRF neural radiance field
8 FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments FIReStereo:用于视觉退化环境中无人机深度感知的森林红外立体数据集 depth estimation stereo depth
9 Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor 提出Depth on Demand,利用低帧率深度传感器和高帧率RGB相机实现高精度稠密深度流。 depth estimation
10 Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction 提出深度高度解耦(DHD)框架,提升视觉3D Occupancy预测精度 height map

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
11 DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors DreamHOI:利用扩散先验实现主体驱动的3D人-物交互生成 distillation NeRF neural radiance field
12 Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data 提出Rt-OmniMVS,一种基于教师-学生学习的实时多视角全景深度估计方法,适用于真实场景。 teacher-student depth estimation
13 Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation? 利用视觉基础模型和HQHSAM解码头提升医学图像分割的领域泛化能力 MAE foundation model
14 MambaMIC: An Efficient Baseline for Microscopic Image Classification with State Space Models MambaMIC:一种基于状态空间模型的高效显微图像分类基线方法 Mamba SSM state space model
15 CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model CollaMamba:提出基于空间-时间状态空间模型的协同感知方法,提升效率。 Mamba SSM state space model
16 Top-down Activity Representation Learning for Video Question Answering 提出基于自顶向下活动表示学习的视频问答方法,提升长时序上下文事件理解能力。 representation learning multimodal
17 Learning Brain Tumor Representation in 3D High-Resolution MR Images via Interpretable State Space Models 提出基于状态空间模型的掩码自编码器,用于学习3D高分辨率脑肿瘤MR图像表征。 SSM state space model masked autoencoder
18 Multi-object event graph representation learning for Video Question Answering 提出CLanG,利用对比学习多对象事件图表示,提升视频问答中复杂场景理解能力。 representation learning contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
19 Large Language Model-Guided Semantic Alignment for Human Activity Recognition LanHAR:利用大语言模型进行语义对齐的人体活动识别 large language model
20 SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality SimMAT:探索视觉基础模型向任意图像模态的可迁移性 foundation model
21 Deep Multimodal Learning with Missing Modality: A Survey 综述缺失模态下的深度多模态学习方法,应对实际应用中模态数据缺失问题。 multimodal
22 HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers HiRT:利用分层机器人Transformer增强机器人控制,实现动态任务中的实时交互。 vision-language-action VLA
23 What Makes a Maze Look Like a Maze? 提出Deep Schema Grounding (DSG)框架,提升视觉抽象概念的理解与推理能力 large language model
24 Bayesian Self-Training for Semi-Supervised 3D Segmentation 提出基于贝叶斯自训练的半监督3D分割框架,提升标注数据稀缺场景下的分割精度。 visual grounding
25 Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings 提出一种基于情感的绘画音乐生成模型,弥合视觉艺术与音乐之间的鸿沟 multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
26 ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE 提出ProbTalk3D以解决情感控制的语音驱动3D面部动画合成问题 VQ-VAE

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
27 Depth Matters: Exploring Deep Interactions of RGB-D for Semantic Segmentation in Traffic Scenes 提出深度交互金字塔Transformer,解决交通场景语义分割中深度信息利用不足问题 spatial relationship

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
28 GAZEploit: Remote Keystroke Inference Attack by Gaze Estimation from Avatar Views in VR/MR Devices GAZEploit:利用VR/MR设备中头像视图的视线估计进行远程击键推断攻击 Apple Vision Pro

⬅️ 返回 cs.CV 首页 · 🏠 返回主页