cs.CV(2024-09-18)

📊 共 26 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (10 🔗6) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
1 Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus 提出基于3D高斯溅射和Siamese网络的自监督散焦深度估计框架 depth estimation monocular depth stereo depth
2 Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks 提出基于梯度驱动的3D高斯分割与可供性迁移方法,提升3D场景理解能力。 3D gaussian splatting 3DGS gaussian splatting
3 BRDF-NeRF: Neural Radiance Fields with Optical Satellite Images and BRDF Modelling 提出BRDF-NeRF以解决卫星图像中BRDF建模问题 NeRF neural radiance field
4 Optical Flow Matters: an Empirical Comparative Study on Fusing Monocular Extracted Modalities for Better Steering 提出单目多模态融合的端到端自动驾驶转向预测方法,显著提升转向精度。 optical flow multimodal
5 LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension 提出LLM-wrapper,利用大语言模型黑盒适配视觉语言模型,提升指代表达理解性能。 open-vocabulary open vocabulary large language model
6 ORB-SfMLearner: ORB-Guided Self-supervised Visual Odometry with Selective Online Adaptation 提出ORB引导的自监督视觉里程计,通过选择性在线自适应提升泛化性。 visual odometry
7 SRIF: Semantic Shape Registration Empowered by Diffusion-based Image Morphing and Flow Estimation 提出SRIF,利用扩散模型图像形变和光流估计实现语义形状配准 3D gaussian splatting gaussian splatting splatting
8 Vista3D: Unravel the 3D Darkside of a Single Image Vista3D:提出快速且一致的单图像3D生成框架,揭示物体隐藏的3D信息。 gaussian splatting splatting
9 Panoptic-Depth Forecasting 提出Panoptic-Depth Forecasting任务,用于预测未来帧的全景分割和深度图,提升机器人导航安全性。 depth estimation
10 DAF-Net: A Dual-Branch Feature Decomposition Fusion Network with Domain Adaptive for Infrared and Visible Image Fusion 提出DAF-Net,通过双分支特征分解和领域自适应实现红外与可见光图像融合 scene understanding

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
11 ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation ChefFusion:融合食谱与食物图像生成的多模态基础模型 large language model foundation model multimodal
12 Large Language Models are Strong Audio-Visual Speech Recognition Learners 提出Llama-AVSR,利用多模态LLM实现卓越的语音和视听语音识别 large language model multimodal
13 Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation using Rein to Fine-tune Vision Foundation Models 提出Rein微调方法,高效适配视觉基础模型,解决跨器官和扫描仪的腺癌分割问题 foundation model
14 Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression 提出Free-VSC,利用视觉基础模型语义增强无监督视频语义压缩 foundation model
15 Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Qwen2-VL:通过动态分辨率增强视觉语言模型对世界的感知 multimodal
16 Knowledge Adaptation Network for Few-Shot Class-Incremental Learning 提出知识自适应网络KANet,解决少样本类增量学习中的表示偏差问题 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
17 StableMamba: Distillation-free Scaling of Large SSMs for Images and Videos 提出StableMamba,一种无需蒸馏即可扩展大规模SSM用于图像和视频任务的架构 Mamba SSM distillation
18 Multimodal Generalized Category Discovery 提出MM-GCD框架,通过对齐特征和输出空间解决多模态广义类别发现问题 contrastive learning distillation multimodal
19 JEAN: Joint Expression and Audio-guided NeRF-based Talking Face Generation 提出JEAN,一种基于NeRF的联合表情和音频引导的说话人脸生成方法 contrastive learning NeRF
20 PhysMamba: Efficient Remote Physiological Measurement with SlowFast Temporal Difference Mamba PhysMamba:利用时序差分Mamba高效实现面部视频的远程生理信号测量 Mamba SSM state space model
21 DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information DETECLAP:利用对象信息增强音视频表征学习,提升细粒度识别能力 representation learning masked autoencoder

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
22 FAST GDRNPP: Improving the Speed of State-of-the-Art 6D Object Pose Estimation 提出FAST GDRNPP,加速6D物体姿态估计,兼顾精度与速度。 manipulation distillation
23 Controllable Shape Modeling with Neural Generalized Cylinder 提出神经广义柱体(NGC)用于可控的神经隐式形状建模 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
24 MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion MoRAG:提出一种基于多部分融合检索增强生成的人体运动生成方法。 motion diffusion model motion diffusion motion generation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
25 LFIC-DRASC: Deep Light Field Image Compression Using Disentangled Representation and Asymmetrical Strip Convolution 提出LFIC-DRASC,利用解耦表示和非对称条形卷积实现高效光场图像压缩。 spatial relationship

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
26 WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild WiLoR:提出一个端到端框架,用于野外环境下的3D手部定位与重建。 hand reconstruction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页