cs.CV(2024-09-12)
📊 共 28 篇论文 | 🔗 8 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (10 🔗4)
支柱二:RL算法与架构 (RL & Architecture) (8 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy | 针对内窥镜图像,提出改进的Depth Anything模型用于无监督单目深度估计 | depth estimation monocular depth Depth Anything | ||
| 2 | FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally | 提出FlashSplat,通过线性规划最优求解2D到3D高斯溅射分割问题 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 3 | SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length | SwinGS:提出基于滑动窗口高斯溅射的任意长度体视频实时流式传输框架 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 4 | Open-Vocabulary Remote Sensing Image Semantic Segmentation | 提出面向遥感图像的开放词汇语义分割框架,解决方向和尺度变化难题。 | semantic map open-vocabulary open vocabulary | ✅ | |
| 5 | LED: Light Enhanced Depth Estimation at Night | LED:利用车头灯光增强夜间深度估计,提升自动驾驶安全性 | depth estimation Depth Anything scene understanding | ||
| 6 | Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis | Thermal3D-GS:利用物理先验的三维高斯模型,用于热红外新视角合成 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 7 | Expansive Supervision for Neural Radiance Field | 提出Expansive Supervision,通过部分光线选择监督加速NeRF训练,降低时间和内存消耗。 | NeRF neural radiance field | ||
| 8 | FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments | FIReStereo:用于视觉退化环境中无人机深度感知的森林红外立体数据集 | depth estimation stereo depth | ||
| 9 | Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor | 提出Depth on Demand,利用低帧率深度传感器和高帧率RGB相机实现高精度稠密深度流。 | depth estimation | ||
| 10 | Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction | 提出深度高度解耦(DHD)框架,提升视觉3D Occupancy预测精度 | height map | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Large Language Model-Guided Semantic Alignment for Human Activity Recognition | LanHAR:利用大语言模型进行语义对齐的人体活动识别 | large language model | ✅ | |
| 20 | SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality | SimMAT:探索视觉基础模型向任意图像模态的可迁移性 | foundation model | ||
| 21 | Deep Multimodal Learning with Missing Modality: A Survey | 综述缺失模态下的深度多模态学习方法,应对实际应用中模态数据缺失问题。 | multimodal | ||
| 22 | HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers | HiRT:利用分层机器人Transformer增强机器人控制,实现动态任务中的实时交互。 | vision-language-action VLA | ||
| 23 | What Makes a Maze Look Like a Maze? | 提出Deep Schema Grounding (DSG)框架,提升视觉抽象概念的理解与推理能力 | large language model | ||
| 24 | Bayesian Self-Training for Semi-Supervised 3D Segmentation | 提出基于贝叶斯自训练的半监督3D分割框架,提升标注数据稀缺场景下的分割精度。 | visual grounding | ||
| 25 | Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings | 提出一种基于情感的绘画音乐生成模型,弥合视觉艺术与音乐之间的鸿沟 | multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE | 提出ProbTalk3D以解决情感控制的语音驱动3D面部动画合成问题 | VQ-VAE | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | Depth Matters: Exploring Deep Interactions of RGB-D for Semantic Segmentation in Traffic Scenes | 提出深度交互金字塔Transformer,解决交通场景语义分割中深度信息利用不足问题 | spatial relationship |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | GAZEploit: Remote Keystroke Inference Attack by Gaze Estimation from Avatar Views in VR/MR Devices | GAZEploit:利用VR/MR设备中头像视图的视线估计进行远程击键推断攻击 | Apple Vision Pro |