cs.CV(2023-12-28)
📊 共 15 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (3)
支柱一:机器人控制 (Robot Control) (2)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones | TinyGPT-V:通过小型骨干网络实现高效的多模态大语言模型 | large language model multimodal | ||
| 2 | Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos | 提出Grounding-Prompter,利用多模态信息提示LLM解决长视频时序语句定位问题 | multimodal chain-of-thought | ||
| 3 | LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model | LISA++:基于大型语言模型的推理分割的改进基线 | large language model | ||
| 4 | Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels | 提出Segment3D,无需人工标注即可学习细粒度、类别无关的3D分割 | foundation model | ||
| 5 | MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | MobileVLM:面向移动设备的高效、强大且开放的视觉语言助手 | multimodal | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction | SR-LIVO:基于扫描重建的激光雷达-惯性-视觉里程计与建图 | visual odometry VIO LIO | ||
| 7 | DreamGaussian4D: Generative 4D Gaussian Splatting | DreamGaussian4D:提出基于高斯溅射的生成式4D内容高效生成框架 | gaussian splatting splatting | ||
| 8 | Robust Multi-Modal Image Stitching for Improved Scene Understanding | 提出一种鲁棒的多模态图像拼接方法,提升场景理解能力 | scene understanding | ||
| 9 | Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis | 提出时空高斯特征溅射,实现动态场景实时新视角合成 | splatting | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | RL-LOGO: Deep Reinforcement Learning Localization for Logo Recognition | 提出基于深度强化学习的RL-LOGO方法,用于无标注Logo图像的定位与识别。 | reinforcement learning deep reinforcement learning | ||
| 11 | FlowDA: Unsupervised Domain Adaptive Framework for Optical Flow Estimation | FlowDA:面向光流估计的无监督领域自适应框架,提升真实场景性能 | curriculum learning optical flow | ||
| 12 | Joint Learning for Scattered Point Cloud Understanding with Hierarchical Self-Distillation | 提出基于层级自蒸馏的联合学习框架,提升稀疏点云理解能力。 | masked autoencoder MAE distillation |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | InsActor: Instruction-driven Physics-based Characters | InsActor:提出指令驱动的物理角色动画生成框架,实现高层指令控制。 | motion planning diffusion policy motion generation | ||
| 14 | Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action | Unified-IO 2:首个支持图像、文本、音频和动作的自回归多模态模型,实现通用理解与生成。 | manipulation multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single Camera | 提出基于单目视频的动态服装3D人体建模方法,解决运动模糊问题。 | physically plausible |