cs.CV(2024-10-16)

📊 共 10 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (4 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
1 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio 提出CMM基准,系统评估大型多模态模型在语言、视觉和音频上的幻觉问题。 multimodal
2 VividMed: Vision Language Model with Versatile Visual Grounding for Medicine VividMed:面向医学领域,具备多功能视觉定位的视觉语言模型 visual grounding
3 Automatic Mapping of Anatomical Landmarks from Free-Text Using Large Language Models: Insights from Llama-2 利用Llama-2自动映射医学影像报告中的解剖标志,提升医疗影像工作流效率 large language model
4 DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception DocLayout-YOLO:通过多样合成数据和自适应感受野增强文档布局分析 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
5 MambaBEV: An efficient 3D detection model with Mamba2 MambaBEV:利用Mamba2提升BEV视角3D目标检测的效率与精度 Mamba SSM state space model
6 GAN Based Top-Down View Synthesis in Reinforcement Learning Environments 提出基于GAN的自顶向下视图合成方法,用于增强强化学习环境中的智能体感知。 reinforcement learning first-person view
7 MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization MuVi:提出一种基于语义对齐和节奏同步的视频到音乐生成框架 flow matching visual pre-training

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
8 EG-HumanNeRF: Efficient Generalizable Human NeRF Utilizing Human Prior for Sparse View 提出EG-HumanNeRF,利用人体先验知识,高效生成稀疏视角下高质量可泛化的人体NeRF模型。 NeRF neural radiance field
9 Radon Implicit Field Transform (RIFT): Learning Scenes from Radar Signals 提出Radon隐式场变换(RIFT),利用雷达信号学习场景表示,降低数据采集成本。 scene reconstruction

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
10 UniCoN: Universal Conditional Networks for Multi-Age Embryonic Cartilage Segmentation with Sparsely Annotated Data UniCoN:通用条件网络,用于稀疏标注数据下的多年龄胚胎软骨分割 UniCon

⬅️ 返回 cs.CV 首页 · 🏠 返回主页