cs.CV(2024-08-03)

📊 共 10 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱八:物理动画 (Physics-based Animation) (2) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
1 E$^{3}$NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images 提出E$^{3}$NeRF,利用事件相机数据从模糊图像中高效重建清晰的神经辐射场。 NeRF neural radiance field
2 FBINeRF: Feature-Based Integrated Recurrent Network for Pinhole and Fisheye Neural Radiance Fields FBINeRF:用于针孔和鱼眼神经辐射场的基于特征的集成循环网络 NeRF neural radiance field scene reconstruction
3 Deep Patch Visual SLAM 提出Deep Patch Visual SLAM,在单GPU上实现高效、低内存的单目视觉SLAM。 visual odometry visual SLAM DROID-SLAM

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
4 MiniCPM-V: A GPT-4V Level MLLM on Your Phone MiniCPM-V:一款可在手机上部署的GPT-4V级别多模态大语言模型 large language model multimodal
5 Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics 提出Signal-SGN,利用脉冲神经网络进行骨骼动作识别,提升能效。 multimodal
6 SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses 提出SynopGround数据集和LGMR模型,解决长视频多段落定位问题 multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
7 MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition 提出MultiFuser,利用多模态融合Transformer增强驾驶员行为识别。 spatiotemporal multimodal
8 GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer GLDiTalker:基于图潜在扩散Transformer的语音驱动3D面部动画生成 spatiotemporal

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
9 JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model 提出JambaTalk,一种基于混合Transformer-Mamba模型的语音驱动3D说话头生成方法。 Mamba SSM state space model
10 MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas MCPDepth:提出基于多柱面全景图立体匹配的全局深度估计方法 MAE depth estimation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页