cs.CV(2024-08-03)
📊 共 10 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱八:物理动画 (Physics-based Animation) (2)
支柱二:RL算法与架构 (RL & Architecture) (2 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | E$^{3}$NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images | 提出E$^{3}$NeRF,利用事件相机数据从模糊图像中高效重建清晰的神经辐射场。 | NeRF neural radiance field | ||
| 2 | FBINeRF: Feature-Based Integrated Recurrent Network for Pinhole and Fisheye Neural Radiance Fields | FBINeRF:用于针孔和鱼眼神经辐射场的基于特征的集成循环网络 | NeRF neural radiance field scene reconstruction | ||
| 3 | Deep Patch Visual SLAM | 提出Deep Patch Visual SLAM,在单GPU上实现高效、低内存的单目视觉SLAM。 | visual odometry visual SLAM DROID-SLAM | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | MiniCPM-V: A GPT-4V Level MLLM on Your Phone | MiniCPM-V:一款可在手机上部署的GPT-4V级别多模态大语言模型 | large language model multimodal | ||
| 5 | Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics | 提出Signal-SGN,利用脉冲神经网络进行骨骼动作识别,提升能效。 | multimodal | ||
| 6 | SynopGround: A Large-Scale Dataset for Multi-Paragraph Video Grounding from TV Dramas and Synopses | 提出SynopGround数据集和LGMR模型,解决长视频多段落定位问题 | multimodal | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition | 提出MultiFuser,利用多模态融合Transformer增强驾驶员行为识别。 | spatiotemporal multimodal | ||
| 8 | GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer | GLDiTalker:基于图潜在扩散Transformer的语音驱动3D面部动画生成 | spatiotemporal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | JambaTalk: Speech-Driven 3D Talking Head Generation Based on Hybrid Transformer-Mamba Model | 提出JambaTalk,一种基于混合Transformer-Mamba模型的语音驱动3D说话头生成方法。 | Mamba SSM state space model | ||
| 10 | MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas | MCPDepth:提出基于多柱面全景图立体匹配的全局深度估计方法 | MAE depth estimation | ✅ |