cs.CV(2024-05-21)

📊 共 33 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (11 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (11) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱八:物理动画 (Physics-based Animation) (2 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
1 Physics-based Scene Layout Generation from Human Motion 提出基于物理的场景布局生成方法,实现逼真的人机交互动画 reinforcement learning affordance physically plausible
2 Cross-spectral Gated-RGB Stereo Depth Estimation 提出跨光谱门控RGB立体深度估计方法,提升远距离深度精度。 MAE depth estimation stereo depth
3 AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection 提出AMFD框架,通过自适应多模态融合蒸馏提升多光谱行人检测效率。 distillation multimodal
4 3DSS-Mamba: 3D-Spectral-Spatial Mamba for Hyperspectral Image Classification 提出3DSS-Mamba,用于高光谱图像分类,提升长程依赖建模效率。 Mamba state space model HSI
5 A Survey of Deep Learning-based Radiology Report Generation Using Multimodal Data 综述基于多模态数据的深度学习放射学报告生成方法,聚焦数据融合与模型可解释性。 contrastive learning multimodal
6 A Multimodal Learning-based Approach for Autonomous Landing of UAV 提出一种基于多模态学习的无人机自主着陆方法,提升精度和环境适应性。 reinforcement learning multimodal
7 Active Object Detection with Knowledge Aggregation and Distillation from Large Models 提出基于知识聚合与蒸馏的主动对象检测方法,提升交互场景下的检测精度。 distillation affordance Ego4D
8 RemoCap: Disentangled Representation Learning for Motion Capture RemoCap:提出解耦表征学习方法,解决复杂遮挡下的三维人体运动捕捉难题 representation learning penetration
9 CLRKDNet: Speeding up Lane Detection with Knowledge Distillation CLRKDNet:利用知识蒸馏加速车道线检测,提升自动驾驶实时性 teacher-student distillation
10 BIMM: Brain Inspired Masked Modeling for Video Representation Learning 提出脑启发的掩码建模BIMM框架,用于视频表征学习 representation learning
11 C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning 提出C3L,通过对比学习生成内容相关视觉-语言指令微调数据,提升LVLM性能。 contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
12 Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models 提出SIU单图遗忘方法,解决多模态大语言模型中视觉概念的有效遗忘问题 large language model multimodal
13 Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting 提出多模态自适应推理与Anytime Early Exiting方法,提升文档图像分类的性能与效率。 foundation model multimodal
14 Context-Enhanced Video Moment Retrieval with Large Language Models 提出LMR模型,利用大语言模型增强视频上下文,提升视频片段检索性能。 large language model language conditioned
15 CamViG: Camera Aware Image-to-Video Generation with Multimodal Transformers CamViG:基于多模态Transformer的相机感知图像到视频生成 multimodal
16 BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once BiomedParse:用于生物医学图像解析的通用基础模型,一次性完成所有任务。 foundation model
17 Comprehensive Multimodal Deep Learning Survival Prediction Enabled by a Transformer Architecture: A Multicenter Study in Glioblastoma 提出基于Transformer的多模态深度学习模型,提升胶质母细胞瘤生存预测精度。 multimodal
18 An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation 利用大语言模型提升文本到图像生成中的文本理解能力 large language model
19 Multimodal video analysis for crowd anomaly detection using open access tourism cameras 提出一种基于开放旅游摄像头和多模态视频分析的异常人群检测方法 multimodal
20 Is Dataset Quality Still a Concern in Diagnosis Using Large Foundation Model? 研究表明:大型预训练模型在眼底诊断中对数据集质量具有更强的鲁棒性 foundation model
21 Mutual Information Analysis in Multimodal Learning Systems 提出InfoMeter,通过互信息分析提升多模态3D目标检测系统性能。 multimodal
22 Towards Retrieval-Augmented Architectures for Image Captioning 提出一种检索增强的图像描述架构,利用外部知识库提升生成质量。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
23 MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video 提出MOSS框架,利用运动信息从单目视频中合成逼真的3D服装人体模型 gaussian splatting splatting NeRF
24 WorldAfford: Affordance Grounding based on Natural Language Instructions 提出WorldAfford框架,解决基于自然语言指令的Affordance区域定位问题 affordance chain-of-thought
25 Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations 提出一种基于NeRF的航天器位姿估计方法,用于未知空间目标的近距离操作。 NeRF neural radiance field
26 Gaussian Control with Hierarchical Semantic Graphs in 3D Human Recovery 提出HUGS框架,利用层级语义图控制3D高斯模型,提升人体3D重建质量 3D gaussian splatting 3DGS gaussian splatting
27 Rethink Predicting the Optical Flow with the Kinetics Perspective 提出基于运动学视角的光流预测方法,提升遮挡和快速运动场景下的性能。 optical flow
28 Anticipating Object State Changes in Long Procedural Videos 提出Ego4D-OSCA数据集,解决长程序视频中物体状态变化的预测问题 scene understanding Ego4D

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
29 Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding 提出基于微手势理解的无身份情感人工智能方法,提升情感理解能力。 spatiotemporal large language model
30 Text-Video Retrieval with Global-Local Semantic Consistent Learning 提出全局-局部语义一致性学习方法GLSCL,高效解决文本-视频检索问题。 PULSE

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
31 OmniGlue: Generalizable Feature Matching with Foundation Model Guidance OmniGlue:利用基础模型引导的通用特征匹配方法 feature matching foundation model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
32 DisenStudio: Customized Multi-subject Text-to-Video Generation with Disentangled Spatial Control DisenStudio:提出解耦空间控制的多主体文本到视频生成框架 motion generation

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
33 EmoEdit: Evoking Emotions through Image Manipulation EmoEdit:通过图像内容操控激发情感,提升情感图像编辑效果 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页