cs.CV(2024-05-13)
📊 共 23 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (8 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (3)
支柱四:生成式动作 (Generative Motion) (2)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | SignAvatar: Sign Language 3D Motion Reconstruction and Generation | SignAvatar:提出基于Transformer的框架,用于手语3D动作重建与生成。 | curriculum learning motion reconstruction | ||
| 2 | FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival | FORESEE:用于癌症生存预测的多模态多视角表征学习框架 | representation learning masked autoencoder multimodal | ||
| 3 | Motion Keyframe Interpolation for Any Human Skeleton via Temporally Consistent Point Cloud Sampling and Reconstruction | 提出PC-MRL,通过时序一致点云采样与重建实现任意人体骨骼的运动关键帧插值。 | representation learning human motion motion representation | ||
| 4 | OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition | 提出OverlapMamba,一种基于LiDAR的新型移位状态空间模型,用于解决地点识别问题。 | Mamba SSM state space model | ||
| 5 | MambaOut: Do We Really Need Mamba for Vision? | 提出MambaOut模型,揭示Mamba在图像分类任务中的非必要性,并探索其在长序列视觉任务中的潜力。 | Mamba SSM state space model | ✅ | |
| 6 | GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images | 提出梯度引导的Mamba网络GMSR-Net,用于RGB图像光谱重建,实现精度与效率的平衡。 | Mamba | ✅ | |
| 7 | MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders | MonoMAE:通过深度感知掩码自编码器增强单目3D目标检测 | masked autoencoder | ||
| 8 | Sakuga-42M Dataset: Scaling Up Cartoon Research | 提出Sakuga-42M大规模卡通数据集,促进卡通视频理解与生成研究 | Mamba foundation model |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | MedVersa: A Generalist Foundation Model for Medical Image Interpretation | MedVersa:用于医学图像解读的通用基础模型,性能媲美专家系统 | foundation model multimodal | ||
| 10 | AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous Driving | AnoVox:用于自动驾驶多模态异常检测的大规模基准数据集 | multimodal | ||
| 11 | IHC Matters: Incorporating IHC analysis to H&E Whole Slide Image Analysis for Improved Cancer Grading via Two-stage Multimodal Bilinear Pooling Fusion | 提出双阶段多模态双线性池化融合模型,利用IHC提升H&E图像癌症分级 | multimodal | ||
| 12 | Improving Multimodal Learning with Multi-Loss Gradient Modulation | 提出多损失梯度调制方法,提升多模态学习中模态融合效果。 | multimodal | ||
| 13 | FreeVA: Offline MLLM as Training-Free Video Assistant | FreeVA:无需训练的离线MLLM视频助手,超越视频指令调优方法 | large language model multimodal | ✅ | |
| 14 | Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation | 研究CLIP零样本异常分割的语义鲁棒性,揭示其在语义扰动下的性能下降 | foundation model | ||
| 15 | Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? | 提出基于LLM类描述的Prompt Tuning方法,提升VLM泛化能力 | large language model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting | GaussianVTON:提出基于图像提示的多阶段高斯溅射编辑3D人体虚拟试穿方法 | gaussian splatting splatting | ||
| 17 | Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs | 提出坐标网络与张量特征协同的NeRF方法,提升稀疏输入下的重建效果 | NeRF neural radiance field | ||
| 18 | SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling | SceneFactory:面向工作流的统一增量场景建模框架,支持多种重建任务。 | depth estimation scene reconstruction |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of Metrics | 提出人体运动生成统一评估框架,对比分析现有指标并引入时序扭曲多样性度量。 | motion generation human motion human motion generation | ||
| 20 | Generating Human Motion in 3D Scenes from Text Descriptions | 提出一种基于文本描述在3D场景中生成人机交互运动的方法 | motion generation human-scene interaction human motion |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving | 提出oTTC,扩展目标检测模型以估计自动驾驶中的目标碰撞时间 | motion estimation |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection | 提出一种语义和运动感知的时空Transformer网络用于动作检测 | spatiotemporal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | Boostlet.js: Image processing plugins for the web via JavaScript injection | Boostlet.js:通过JavaScript注入为Web提供图像处理插件 | manipulation |