cs.CV(2025-03-16)

📊 共 5 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (3 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (2 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
1 Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding 提出Logic-RAG以解决大规模多模态模型空间推理不足问题 scene understanding multimodal
2 MTGS: Multi-Traversal Gaussian Splatting 提出MTGS,利用多视角高斯溅射重建高质量驾驶场景,解决动态物体和外观变化问题。 gaussian splatting splatting scene reconstruction
3 EgoEvGesture: Gesture Recognition Based on Egocentric Event Camera 提出EgoEvGesture,一种基于事件相机的轻量级手势识别网络,并构建了大规模数据集。 metric depth egocentric spatiotemporal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
4 Semantic Matters: Multimodal Features for Affective Analysis 提出融合语音、文本和视觉模态的多模态情感分析方法,提升情感识别精度。 multimodal
5 AdaReTaKe: Adaptive Redundancy Reduction to Perceive Longer for Video-language Understanding 提出AdaReTaKe以解决长视频理解中的冗余问题 large language model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页