cs.CV(2024-08-19)

📊 共 27 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (11 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
1 3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning CHASE:利用高斯溅射和对比学习,通过稀疏输入实现3D一致的人体化身 contrastive learning 3D gaussian splatting 3DGS
2 ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement ExpoMamba:利用频率SSM块实现高效图像增强,解决低光照和混合曝光问题 Mamba SSM foundation model
3 R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation R2GenCSR:提出一种基于上下文检索的X射线医学报告生成框架,提升LLM生成质量。 Mamba large language model
4 MambaLoc: Efficient Camera Localisation via State Space Model MambaLoc:提出基于状态空间模型的高效相机定位方法,解决训练成本高和数据稀疏问题。 Mamba SSM state space model
5 OccMamba: Semantic Occupancy Prediction with State Space Models 提出OccMamba,首个基于Mamba架构的语义占据预测网络,提升效率与精度。 Mamba state space model
6 $R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement 提出基于强化学习的网格重建方法,通过几何与外观优化提升NeRF重建质量 reinforcement learning NeRF neural radiance field
7 Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data Factorized-Dreamer:利用有限低质量数据训练高质量视频生成器 dreamer optical flow spatiotemporal
8 P3P: Pseudo-3D Pre-training for Scaling 3D Voxel-based Masked Autoencoders 提出P3P框架,利用伪3D预训练扩展体素化掩码自编码器,提升3D感知任务性能。 masked autoencoder MAE depth estimation
9 Multi-Scale Representation Learning for Image Restoration with State-Space Model 提出基于状态空间模型的多尺度图像复原网络MS-Mamba,实现高效高质量图像重建。 Mamba SSM representation learning
10 CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs CLIP-DPO:利用视觉-语言模型偏好优化减少LVLM幻觉 DPO
11 C${^2}$RL: Content and Context Representation Learning for Gloss-free Sign Language Translation and Retrieval 提出C${^2}$RL,用于无词汇的手语翻译和检索,提升表征学习能力。 representation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
12 FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant 提出FFAA:基于多模态大语言模型的可解释开放世界人脸伪造分析助手 large language model multimodal
13 Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation Kubrick:基于多模态Agent协作的合成视频生成框架 large language model multimodal instruction following
14 CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving 提出CoVLA数据集,用于训练自动驾驶中具备视觉-语言-动作能力的模型 vision-language-action VLA large language model
15 Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework 提出MSP60K数据集与LLM-PAR框架以解决行人属性识别问题 large language model
16 Narrowing the Gap between Vision and Action in Navigation 提出低级动作解码器与语义增强航点预测器,提升连续环境视觉语言导航性能 VLN
17 LongVILA: Scaling Long-Context Visual Language Models for Long Videos LongVILA:通过算法-系统协同设计,扩展视觉语言模型处理长视频上下文的能力 foundation model
18 Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track 利用SAM 2实现视频目标分割,LSVOS挑战赛VOS赛道第四名 foundation model
19 Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit VisEdit:通过编辑视觉表征实现视觉语言模型知识校正 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
20 Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation 提出基于多层三平面表示的隐式高斯溅射,实现高效存储和高质量渲染。 3DGS gaussian splatting splatting
21 Topology-aware Human Avatars with Semantically-guided Gaussian Splatting 提出SG-GS,利用语义引导的高斯溅射重建拓扑感知的人体Avatar gaussian splatting splatting SMPL
22 SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition SHARP:利用伪深度分割手部和手臂,提升以自我为中心的3D手势估计和动作识别。 depth estimation egocentric
23 3D-Aware Instance Segmentation and Tracking in Egocentric Videos 提出一种3D感知的自中心视频实例分割与跟踪方法,提升场景理解能力。 scene understanding egocentric
24 NeuFlow v2: Push High-Efficiency Optical Flow To the Limit NeuFlow v2:突破光流估计效率极限,兼顾精度与速度 optical flow
25 LoopSplat: Loop Closure by Registering 3D Gaussian Splats LoopSplat:通过3D高斯溅射配准实现闭环检测,提升RGB-D SLAM全局一致性。 3DGS

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
26 Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video 提出结构保持的图像转换方法,提升结肠镜视频深度估计精度 sim2real depth estimation monocular depth

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
27 SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views SpaRP:基于稀疏视角的快速3D物体重建与姿态估计方法 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页