cs.CV(2024-09-04)

📊 共 21 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving 提出GGS,一种可泛化的高斯溅射方法,用于自动驾驶中的车道变换。 depth estimation 3D gaussian splatting gaussian splatting
2 Object Gaussian for Monocular 6D Pose Estimation from Sparse Views SGPose:基于高斯模型的单目稀疏视图6D位姿估计方法 3D gaussian splatting 3DGS gaussian splatting
3 Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation Plane2Depth:利用层级自适应平面引导的单目深度估计 depth estimation monocular depth metric depth
4 Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models Human-VDM:利用视频扩散模型从单张图像生成高质量3D人体高斯溅射模型 gaussian splatting splatting
5 iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation iConFormer:输入条件自适应的动态参数高效微调方法 depth estimation monocular depth
6 UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching UniTT-Stereo:统一训练Transformer以增强立体匹配性能 depth estimation stereo depth
7 TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT 提出TP-GMOT框架,通过文本提示和运动-外观代价解决通用多目标跟踪问题。 open-vocabulary open vocabulary
8 Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization 提出基于元学习的单目图像深度估计方法,提升零样本跨数据集泛化能力 depth estimation
9 Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes 提出基于多层网格的体表面表示方法,实现毛发等模糊几何体的实时高质量渲染。 splatting

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
10 A Medical Multimodal Large Language Model for Pediatric Pneumonia 提出P2Med-MLLM,用于儿童肺炎的医学多模态大语言模型,辅助诊断与治疗。 large language model multimodal
11 ExpLLM: Towards Chain of Thought for Facial Expression Recognition 提出ExpLLM,利用大语言模型进行面部表情识别的链式推理 large language model chain-of-thought
12 CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently CanvOI:一种肿瘤智能基础模型,通过差异化FLOPS扩展提升病理图像分析性能 foundation model
13 Local Map Construction with SDMap: A Comprehensive Survey 综述SDMap辅助的局部地图构建方法,为智能驾驶提供低成本、高可用的环境感知方案。 multimodal
14 Unified Framework with Consistency across Modalities for Human Activity Recognition 提出基于跨模态一致性的统一框架,用于提升视频人体行为识别性能。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
15 UC-NeRF: Uncertainty-aware Conditional Neural Radiance Fields from Endoscopic Sparse Views UC-NeRF:针对内窥镜稀疏视角,提出不确定性感知条件神经辐射场方法 distillation NeRF neural radiance field
16 SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction 提出SG-MIM,利用结构化知识指导的掩码图像建模,提升密集预测任务性能。 representation learning depth estimation monocular depth
17 Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation 提出非目标发散假设,分析跨模态知识蒸馏中的域差异问题 distillation
18 Collaborative Learning for Enhanced Unsupervised Domain Adaptation 提出CLDA协同学习框架,提升轻量级模型在无监督领域自适应任务中的性能。 teacher-student distillation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
19 MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos MADiff:基于运动感知Mamba扩散模型,预测第一视角视频中的手部轨迹 manipulation Mamba affordance
20 Incorporating dense metric depth into neural 3D representations for view synthesis and relighting 提出结合稠密深度信息的神经3D表示方法,用于视角合成和光照重定向 manipulation metric depth

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
21 PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation PoseTalk:提出基于文本和音频的姿态控制和运动细化的一镜式说话人头部生成方法 motion latent

⬅️ 返回 cs.CV 首页 · 🏠 返回主页