cs.CV(2025-03-16)
📊 共 5 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding | 提出Logic-RAG以解决大规模多模态模型空间推理不足问题 | scene understanding multimodal | ✅ | |
| 2 | MTGS: Multi-Traversal Gaussian Splatting | 提出MTGS,利用多视角高斯溅射重建高质量驾驶场景,解决动态物体和外观变化问题。 | gaussian splatting splatting scene reconstruction | ||
| 3 | EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera | 提出EgoEvGesture,一种基于事件相机的轻量级手势识别网络,并构建了大规模数据集。 | metric depth egocentric spatiotemporal | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | Semantic Matters: Multimodal Features for Affective Analysis | 提出融合语音、文本和视觉模态的多模态情感分析方法,提升情感识别精度。 | multimodal | ||
| 5 | AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding | 提出AdaReTaKe以解决长视频理解中的冗余问题 | large language model multimodal | ✅ |