cs.CV（2023-12-28）

📊 共 15 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (3) 支柱一：机器人控制 (Robot Control) (2) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones	TinyGPT-V：通过小型骨干网络实现高效的多模态大语言模型	large language model multimodal
2	Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos	提出Grounding-Prompter，利用多模态信息提示LLM解决长视频时序语句定位问题	multimodal chain-of-thought
3	LISA++: An Improved Baseline for Reasoning Segmentation with Large Language Model	LISA++：基于大型语言模型的推理分割的改进基线	large language model
4	Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels	提出Segment3D，无需人工标注即可学习细粒度、类别无关的3D分割	foundation model
5	MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices	MobileVLM：面向移动设备的高效、强大且开放的视觉语言助手	multimodal	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
6	SR-LIVO: LiDAR-Inertial-Visual Odometry and Mapping with Sweep Reconstruction	SR-LIVO：基于扫描重建的激光雷达-惯性-视觉里程计与建图	visual odometry VIO LIO
7	DreamGaussian4D: Generative 4D Gaussian Splatting	DreamGaussian4D：提出基于高斯溅射的生成式4D内容高效生成框架	gaussian splatting splatting
8	Robust Multi-Modal Image Stitching for Improved Scene Understanding	提出一种鲁棒的多模态图像拼接方法，提升场景理解能力	scene understanding
9	Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis	提出时空高斯特征溅射，实现动态场景实时新视角合成	splatting	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
10	RL-LOGO: Deep Reinforcement Learning Localization for Logo Recognition	提出基于深度强化学习的RL-LOGO方法，用于无标注Logo图像的定位与识别。	reinforcement learning deep reinforcement learning
11	FlowDA: Unsupervised Domain Adaptive Framework for Optical Flow Estimation	FlowDA：面向光流估计的无监督领域自适应框架，提升真实场景性能	curriculum learning optical flow
12	Joint Learning for Scattered Point Cloud Understanding with Hierarchical Self-Distillation	提出基于层级自蒸馏的联合学习框架，提升稀疏点云理解能力。	masked autoencoder MAE distillation

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
13	InsActor: Instruction-driven Physics-based Characters	InsActor：提出指令驱动的物理角色动画生成框架，实现高层指令控制。	motion planning diffusion policy motion generation
14	Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action	Unified-IO 2：首个支持图像、文本、音频和动作的自回归多模态模型，实现通用理解与生成。	manipulation multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Dynamic Appearance Modeling of Clothed 3D Human Avatars using a Single Camera	提出基于单目视频的动态服装3D人体建模方法，解决运动模糊问题。	physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页