cs.CV(2024-06-07)

📊 共 26 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗5) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5) 支柱一:机器人控制 (Robot Control) (3) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Towards Semantic Equivalence of Tokenization in Multimodal LLM 提出动态语义等价视觉Token化方法SeTok,提升多模态大语言模型性能 large language model multimodal
2 MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description 提出MGIMM,通过多粒度指令学习实现遥感图像属性引导的详细描述生成。 large language model multimodal
3 VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging VISTA3D:用于3D医学影像的统一分割基础模型 foundation model
4 RU-AI: A Large Multimodal Dataset for Machine-Generated Content Detection 提出RU-AI:一个大规模多模态数据集,用于检测机器生成内容 multimodal
5 Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization 提出LOGRAN,利用软逻辑正则化实现可解释的多模态语境外信息检测。 multimodal
6 LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model 提出LocLLM,利用大语言模型实现更通用的基于文本描述的人体关键点定位 large language model
7 Predictive Dynamic Fusion 提出预测动态融合框架,解决多模态融合中的不稳定性问题。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
8 USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation 提出通用分割嵌入USE框架,解决开放词汇图像分割中的精确分类问题 open-vocabulary open vocabulary foundation model
9 OVMR: Open-Vocabulary Recognition with Multi-Modal References 提出OVMR,利用多模态参考信息实现开放词汇识别 open-vocabulary open vocabulary
10 Composition Vision-Language Understanding via Segment and Depth Anything Model 提出深度与分割模型融合以增强视觉语言理解 Depth Anything multimodal
11 Multi-style Neural Radiance Field with AdaIN 提出结合AdaIN和NeRF的多风格神经辐射场,用于风格化新视角合成 NeRF neural radiance field
12 Normal-guided Detail-Preserving Neural Implicit Function for High-Fidelity 3D Surface Reconstruction 提出法线引导的神经隐函数,用于高保真三维表面重建,尤其适用于稀疏视图场景。 monocular depth implicit representation
13 Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior 提出自适应运动先验以解决视频编辑一致性问题 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
14 STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting STAR:提出骨骼感知的文本驱动4D Avatar生成方法,实现网络内运动重定向。 distillation motion retargeting
15 Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs 提出Diffusion Mamba (DiM-3D)模型,高效生成高分辨率3D形状,解决传统扩散模型计算瓶颈。 Mamba SSM
16 Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement 提出时空建模与对比学习相结合的自监督心率测量方法,在RePSS Challenge中获得第二名。 contrastive learning
17 MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers MA-AVT:提出一种参数高效的音视频Transformer,通过模态对齐提升性能。 contrastive learning multimodal
18 Attention Fusion Reverse Distillation for Multi-Lighting Image Anomaly Detection 提出注意力融合反向蒸馏(AFRD)方法,解决多光照图像异常检测问题。 distillation

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
19 Training-Free Video Editing via Optical Flow-Enhanced Score Distillation 提出基于光流增强Score Distillation的免训练视频编辑方法 manipulation distillation optical flow
20 3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination 提出3D-GRAND数据集,提升3D-LLM的场景理解能力并减少幻觉 sim-to-real embodied AI large language model
21 Varying Manifolds in Diffusion: From Time-varying Geometries to Visual Saliency 提出基于生成率的扩散模型几何分析方法,实现图像显著性操控及多种图像编辑任务。 manipulation

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
22 Diving Deep into the Motion Representation of Video-Text Models 利用GPT-4生成细粒度运动描述,提升视频-文本模型对视频运动的理解能力 motion representation
23 SMC++: Masked Learning of Unsupervised Video Semantic Compression 提出基于掩码学习的无监督视频语义压缩框架SMC++,提升视频分析任务性能 motion prediction

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
24 SMART: Scene-motion-aware human action recognition framework for mental disorder group 针对精神障碍患者,提出场景-运动感知的行为识别框架SMART,用于智能医疗视频监控。 human-scene interaction human motion

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
25 ProMotion: Prototypes As Motion Learners ProMotion:提出基于原型学习的统一运动建模框架,提升多种运动任务性能 feature matching motion representation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
26 Semantic Segmentation on VSPW Dataset through Masked Video Consistency 提出基于掩码视频一致性的语义分割方法,提升VSPW数据集性能。 spatiotemporal multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页