cs.CV(2024-07-17)

📊 共 17 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱一:机器人控制 (Robot Control) (3 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱二:RL算法与架构 (RL & Architecture) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models 提出多模态专家混合模型(MoME),提升通用多模态大语言模型性能 large language model multimodal
2 Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild 对比文本化与特征化模型,解决复杂场景下多模态情感识别问题 large language model multimodal
3 NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models NavGPT-2:释放大型视觉语言模型在机器人导航中的推理能力 VLN large language model instruction following
4 Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models 提出一种新框架以解决多模态学习中的缺失模态预测问题 multimodal
5 EchoSight: Advancing Visual-Language Models with Wiki Knowledge EchoSight:利用维基知识增强视觉-语言模型,提升知识型VQA性能 large language model multimodal
6 ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data ProcTag:通过过程标签评估文档指令数据的有效性,提升文档VQA模型性能。 large language model multimodal
7 DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion DreamStory:提出LLM引导的多主体一致性扩散模型,实现开放域故事可视化 multimodal

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
8 Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation 提出基于物理的结构光合成数据模拟系统,缩小Sim2Real差距,应用于工业机器人抓取。 sim2real domain randomization
9 HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects HIMO:一个用于全身人与多物体交互的新基准数据集 manipulation human-object interaction HOI
10 EmoFace: Audio-driven Emotional 3D Face Animation EmoFace:提出音频驱动的情感3D面部动画生成方法,适用于MetaHuman模型。 manipulation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
11 Generalizable Human Gaussians for Sparse View Synthesis 提出Generalizable Human Gaussians,解决稀疏视角下人体三维重建与渲染问题 gaussian splatting splatting NeRF
12 Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks 评估自监督学习模型在多样下游任务中的鲁棒性,揭示其脆弱性并指出改进方向。 depth estimation foundation model

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
13 NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model 提出NL2Contact,利用扩散模型实现自然语言引导的3D手-物接触建模。 hand-object reconstruction large language model
14 ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos ActionSwitch:提出一种类别无关的在线动作检测框架,用于检测流视频中的并发动作。 egocentric

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
15 SMooDi: Stylized Motion Diffusion Model SMooDi:提出风格化运动扩散模型,实现文本内容和风格运动驱动的动作生成。 motion diffusion model motion diffusion text-to-motion

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
16 ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders ColorMAE:探索掩码自编码器中数据无关的掩码策略,提升语义分割性能。 masked autoencoder MAE

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
17 VD3D: Taming Large Video Diffusion Transformers for 3D Camera Control VD3D:驯服大型视频扩散Transformer,实现3D相机控制 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页