cs.CV(2023-12-28)

📊 共 15 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
1 TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones TinyGPT-V:通过小型骨干网络实现高效的多模态大语言模型 large language model multimodal
2 Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos 提出Grounding-Prompter,利用多模态信息提示LLM解决长视频时序语句定位问题 multimodal chain-of-thought
3 LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model LISA++:基于大型语言模型的推理分割的改进基线 large language model
4 Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels 提出Segment3D,无需人工标注即可学习细粒度、类别无关的3D分割 foundation model
5 MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices MobileVLM:面向移动设备的高效、强大且开放的视觉语言助手 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
6 SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction SR-LIVO:基于扫描重建的激光雷达-惯性-视觉里程计与建图 visual odometry VIO LIO
7 DreamGaussian4D: Generative 4D Gaussian Splatting DreamGaussian4D:提出基于高斯溅射的生成式4D内容高效生成框架 gaussian splatting splatting
8 Robust Multi-Modal Image Stitching for Improved Scene Understanding 提出一种鲁棒的多模态图像拼接方法,提升场景理解能力 scene understanding
9 Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis 提出时空高斯特征溅射,实现动态场景实时新视角合成 splatting

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
10 RL-LOGO: Deep Reinforcement Learning Localization for Logo Recognition 提出基于深度强化学习的RL-LOGO方法,用于无标注Logo图像的定位与识别。 reinforcement learning deep reinforcement learning
11 FlowDA: Unsupervised Domain Adaptive Framework for Optical Flow Estimation FlowDA:面向光流估计的无监督领域自适应框架,提升真实场景性能 curriculum learning optical flow
12 Joint Learning for Scattered Point Cloud Understanding with Hierarchical Self-Distillation 提出基于层级自蒸馏的联合学习框架,提升稀疏点云理解能力。 masked autoencoder MAE distillation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
13 InsActor: Instruction-driven Physics-based Characters InsActor:提出指令驱动的物理角色动画生成框架,实现高层指令控制。 motion planning diffusion policy motion generation
14 Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action Unified-IO 2:首个支持图像、文本、音频和动作的自回归多模态模型,实现通用理解与生成。 manipulation multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
15 Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single Camera 提出基于单目视频的动态服装3D人体建模方法,解决运动模糊问题。 physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页