cs.CV(2024-10-23)
📊 共 21 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (7 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting | 提出PLGS以解决3D高斯点云在噪声下的全景分割问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 9 | VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points | VR-Splatting:结合3D高斯溅射与神经点的注视点辐射场渲染,提升VR体验 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 10 | Efficient Neural Implicit Representation for 3D Human Reconstruction | 提出HumanAvatar,融合HuMoR、Instant-NGP和Fast-SNARF,高效重建3D人体化身。 | NeRF neural radiance field implicit representation | ||
| 11 | OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking | 构建大规模开放词汇多目标跟踪基准OVT-B,并提出融合运动特征的基线方法。 | open-vocabulary open vocabulary | ✅ | |
| 12 | Few-shot NeRF by Adaptive Rendering Loss Regularization | 提出AR-NeRF,通过自适应渲染损失正则化解决少样本NeRF新视角合成问题 | NeRF neural radiance field | ||
| 13 | Semantic Segmentation and Scene Reconstruction of RGB-D Image Frames: An End-to-End Modular Pipeline for Robotic Applications | 提出端到端模块化流程,用于RGB-D图像帧的语义分割与场景重建,提升机器人应用。 | scene reconstruction |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models | 提出AVHBench,用于评估音视频大语言模型中的跨模态幻觉问题 | large language model multimodal | ✅ | |
| 15 | TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts | TP-Eval:通过定制提示词挖掘多模态大语言模型在评估中的潜力 | large language model multimodal | ||
| 16 | Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation | DDL-CXR:通过个体化胸部X光生成解决临床多模态融合中的异步性问题 | multimodal | ||
| 17 | UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion | 提出UnCLe基准,用于评估深度补全的无监督持续学习能力。 | multimodal | ||
| 18 | ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting | 提出视觉-时间上下文提示以解决开放世界交互问题 | multimodal | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | CARLA2Real: a tool for reducing the sim2real appearance gap in CARLA simulator | CARLA2Real:一种降低CARLA模拟器中Sim2Real外观差异的工具 | sim2real | ✅ | |
| 20 | WorldSimBench: Towards Video Generation Models as World Simulators | 提出WorldSimBench,用于评估视频生成模型作为世界模拟器的能力,涵盖具身智能场景。 | manipulation predictive model |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Robust Two-View Geometry Estimation with Implicit Differentiation | 提出基于隐式微分的鲁棒双视图几何估计框架,提升相机位姿估计精度。 | feature matching | ✅ |