cs.CV(2024-06-26)

📊 共 27 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (11) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱一:机器人控制 (Robot Control) (3) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱二:RL算法与架构 (RL & Architecture) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
1 A Refer-and-Ground Multimodal Large Language Model for Biomedicine 提出BiRD模型,首个用于生物医学图像Refer-and-Ground的多模态大语言模型 large language model multimodal
2 MammothModa: Multi-Modal Large Language Model MammothModa:一种在基础模型上实现SOTA性能的多模态大语言模型 large language model multimodal
3 MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data MUMU:利用文本到图像数据引导多模态图像生成 multimodal
4 Evaluating and Benchmarking Foundation Models for Earth Observation and Geospatial AI 提出地球观测基础模型评估基准,提升遥感任务的标签效率。 foundation model
5 Improving EO Foundation Models with Confidence Assessment for enhanced Semantic segmentation 提出CAS模型,通过置信度评估提升遥感影像语义分割性能 foundation model
6 On the Role of Visual Grounding in VQA 提出视觉 grounding 推理框架,揭示 VQA 模型中的 shortcut 学习问题 visual grounding
7 Generative artificial intelligence in ophthalmology: multimodal retinal images for the diagnosis of Alzheimer's disease with convolutional neural networks 利用生成式AI和多模态视网膜图像,结合卷积神经网络辅助诊断阿尔茨海默病 multimodal
8 GUIDE: A Guideline-Guided Dataset for Instructional Video Comprehension 提出GUIDE数据集,用于指导性视频理解,填补任务级经验指导的空白。 foundation model TAMP
9 Chrono: A Simple Blueprint for Representing Time in MLLMs Chrono:一种MLLM中表示时间的简单通用序列蓝图,提升视频时序定位性能 large language model multimodal
10 MatchTime: Towards Automatic Soccer Game Commentary Generation 提出MatchTime:面向自动足球赛事解说生成的时序对齐数据集与模型 TAMP
11 Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs 提出Speech2UnifiedExpressions,同步合成逼真口语情感面部和身体表情 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
12 On Scaling Up 3D Gaussian Splatting Training Grendel:提出一种可扩展的3D高斯溅射训练分布式系统,解决高分辨率和大规模场景重建的内存瓶颈。 3D gaussian splatting 3DGS gaussian splatting
13 GS-Octree: Octree-based 3D Gaussian Splatting for Robust Object-level 3D Reconstruction Under Strong Lighting 提出GS-Octree以解决强光照下物体级3D重建问题 3D gaussian splatting gaussian splatting splatting
14 Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning 提出基于梯度信息的3D高斯Splats后处理剪枝方法,实现高效压缩。 3D gaussian splatting 3DGS gaussian splatting
15 DoubleTake: Geometry Guided Depth Estimation DoubleTake:利用几何引导的深度估计,实现交互式速率下的高质量3D重建。 depth estimation scene reconstruction
16 VDG: Vision-Only Dynamic Gaussian for Driving Simulation 提出VDG:一种仅使用视觉信息的动态高斯模型,用于驾驶仿真。 gaussian splatting splatting scene reconstruction
17 Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos 提出Dynamic Gaussian Marbles,用于单目视频的新视角合成,提升动态场景几何重建质量。 gaussian splatting splatting
18 Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference 提出一种基于单张参考图、无训练的通用3D相对姿态估计方法 semantic map

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
19 GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality GaussianDreamerPro:提出高质量可操控的文本驱动3D高斯模型生成框架 manipulation dreamer 3D gaussian splatting
20 3D Feature Distillation with Object-Centric Priors 提出基于物体中心先验的3D特征蒸馏方法,提升单视角RGB-D图像的语言引导机器人操作性能。 manipulation distillation open-vocabulary
21 CTS: Sim-to-Real Unsupervised Domain Adaptation on 3D Detection 提出CTS框架,解决3D检测中Sim-to-Real无监督域自适应问题 sim-to-real

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
22 Geometric Features Enhanced Human-Object Interaction Detection 提出GeoHOI,利用几何特征增强Transformer在遮挡场景下的人-物交互检测性能 human-object interaction HOI
23 Human-Aware 3D Scene Generation with Spatially-constrained Diffusion Models 提出基于空间约束扩散模型的人体感知3D场景生成方法,解决物体重叠问题。 human-object interaction human-scene interaction human motion

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
24 EgoVideo: Exploring Egocentric Foundation Model and Downstream Adaptation 提出EgoVideo自中心视觉基础模型,并成功应用于EgoVis挑战赛多个任务。 egocentric Ego4D foundation model

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
25 Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model 提出Changen2:一种多时相遥感生成式变化基础模型,用于生成变化数据以训练变化检测模型。 spatiotemporal foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
26 On Reducing Activity with Distillation and Regularization for Energy Efficient Spiking Neural Networks 提出基于知识蒸馏和正则化的SNN训练方法,降低活动量并保持精度 distillation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
27 DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image DICE:首个单图端到端手脸交互形变捕捉方法 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页