cs.CV(2025-01-18)
📊 共 6 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (3)
支柱九:具身大模型 (Embodied Foundation Models) (2 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting | 提出DAVIGS,解耦高斯溅射中外观变化,提升新视角合成质量。 | gaussian splatting splatting | ||
| 2 | Quadcopter Position Hold Function using Optical Flow in a Smartphone-based Flight Computer | 提出一种基于智能手机光流的四旋翼无人机定点悬停方法 | optical flow | ||
| 3 | Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection | 提出多模态融合与查询精炼网络MRNet,用于视频片段检索与高光检测。 | optical flow |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No! | 揭示多模态大语言模型在视觉时序理解与推理上的局限性 | large language model multimodal | ✅ | |
| 5 | Visual RAG: Expanding MLLM visual knowledge without fine-tuning | 提出Visual RAG,无需微调即可扩展MLLM的视觉知识 | large language model multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection | LD-DETR:用于视频片段检索和高光检测的循环解码检测Transformer | contrastive learning multimodal | ✅ |