cs.CV(2024-05-13)

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱四:生成式动作 (Generative Motion) (2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 SignAvatar: Sign Language 3D Motion Reconstruction and Generation SignAvatar:提出基于Transformer的框架,用于手语3D动作重建与生成。 curriculum learning motion reconstruction
2 FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival FORESEE:用于癌症生存预测的多模态多视角表征学习框架 representation learning masked autoencoder multimodal
3 Motion Keyframe Interpolation for Any Human Skeleton via Temporally Consistent Point Cloud Sampling and Reconstruction 提出PC-MRL,通过时序一致点云采样与重建实现任意人体骨骼的运动关键帧插值。 representation learning human motion motion representation
4 OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition 提出OverlapMamba,一种基于LiDAR的新型移位状态空间模型,用于解决地点识别问题。 Mamba SSM state space model
5 MambaOut: Do We Really Need Mamba for Vision? 提出MambaOut模型,揭示Mamba在图像分类任务中的非必要性,并探索其在长序列视觉任务中的潜力。 Mamba SSM state space model
6 GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images 提出梯度引导的Mamba网络GMSR-Net,用于RGB图像光谱重建,实现精度与效率的平衡。 Mamba
7 MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders MonoMAE:通过深度感知掩码自编码器增强单目3D目标检测 masked autoencoder
8 Sakuga-42M Dataset: Scaling Up Cartoon Research 提出Sakuga-42M大规模卡通数据集,促进卡通视频理解与生成研究 Mamba foundation model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
9 MedVersa: A Generalist Foundation Model for Medical Image Interpretation MedVersa:用于医学图像解读的通用基础模型,性能媲美专家系统 foundation model multimodal
10 AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous Driving AnoVox:用于自动驾驶多模态异常检测的大规模基准数据集 multimodal
11 IHC Matters: Incorporating IHC analysis to H&E Whole Slide Image Analysis for Improved Cancer Grading via Two-stage Multimodal Bilinear Pooling Fusion 提出双阶段多模态双线性池化融合模型,利用IHC提升H&E图像癌症分级 multimodal
12 Improving Multimodal Learning with Multi-Loss Gradient Modulation 提出多损失梯度调制方法,提升多模态学习中模态融合效果。 multimodal
13 FreeVA: Offline MLLM as Training-Free Video Assistant FreeVA:无需训练的离线MLLM视频助手,超越视频指令调优方法 large language model multimodal
14 Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation 研究CLIP零样本异常分割的语义鲁棒性,揭示其在语义扰动下的性能下降 foundation model
15 Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? 提出基于LLM类描述的Prompt Tuning方法,提升VLM泛化能力 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
16 GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting GaussianVTON:提出基于图像提示的多阶段高斯溅射编辑3D人体虚拟试穿方法 gaussian splatting splatting
17 Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs 提出坐标网络与张量特征协同的NeRF方法,提升稀疏输入下的重建效果 NeRF neural radiance field
18 SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling SceneFactory:面向工作流的统一增量场景建模框架,支持多种重建任务。 depth estimation scene reconstruction

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
19 Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of Metrics 提出人体运动生成统一评估框架,对比分析现有指标并引入时序扭曲多样性度量。 motion generation human motion human motion generation
20 Generating Human Motion in 3D Scenes from Text Descriptions 提出一种基于文本描述在3D场景中生成人机交互运动的方法 motion generation human-scene interaction human motion

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
21 oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving 提出oTTC,扩展目标检测模型以估计自动驾驶中的目标碰撞时间 motion estimation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
22 A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection 提出一种语义和运动感知的时空Transformer网络用于动作检测 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
23 Boostlet.js: Image processing plugins for the web via JavaScript injection Boostlet.js:通过JavaScript注入为Web提供图像处理插件 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页