cs.CV（2025-01-18）

📊 共 6 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (3) 支柱九：具身大模型 (Embodied Foundation Models) (2 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting	提出DAVIGS，解耦高斯溅射中外观变化，提升新视角合成质量。	gaussian splatting splatting
2	Quadcopter Position Hold Function using Optical Flow in a Smartphone-based Flight Computer	提出一种基于智能手机光流的四旋翼无人机定点悬停方法	optical flow
3	Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection	提出多模态融合与查询精炼网络MRNet，用于视频片段检索与高光检测。	optical flow

🔬 支柱九：具身大模型 (Embodied Foundation Models) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
4	Can Multimodal LLMs do Visual Temporal Understanding and Reasoning? The answer is No!	揭示多模态大语言模型在视觉时序理解与推理上的局限性	large language model multimodal	✅
5	Visual RAG: Expanding MLLM visual knowledge without fine-tuning	提出Visual RAG，无需微调即可扩展MLLM的视觉知识	large language model multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
6	LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection	LD-DETR：用于视频片段检索和高光检测的循环解码检测Transformer	contrastive learning multimodal	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页