cs.CV(2024-05-26)
📊 共 16 篇论文 | 🔗 8 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (5 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (3 🔗3)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs | 揭示冻结LLM多模态泛化能力:探究其内部隐式多模态对齐机制 | large language model multimodal | ✅ | |
| 2 | M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought | 提出M$^3$CoT基准,用于评估多领域多步骤多模态的思维链推理能力 | large language model chain-of-thought | ||
| 3 | Consistency-Guided Asynchronous Contrastive Tuning for Few-Shot Class-Incremental Tuning of Foundation Models | 提出CoACT,用于小样本类增量式微调预训练模型,提升新类学习能力。 | foundation model | ✅ | |
| 4 | CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification | 提出CapS-Adapter,利用caption构建多模态Adapter,提升零样本分类性能。 | multimodal | ✅ | |
| 5 | Exploring a Multimodal Fusion-based Deep Learning Network for Detecting Facial Palsy | 提出基于多模态融合的深度学习网络,用于面瘫的自动检测。 | multimodal | ||
| 6 | Segmentation of Maya hieroglyphs through fine-tuned foundation models | 通过微调基础模型实现玛雅象形文字的精准分割 | foundation model | ||
| 7 | Towards Multi-Task Multi-Modal Models: A Video Generative Perspective | 提出多任务多模态视频生成模型,在高保真视频合成与理解方面超越现有方法。 | foundation model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors | Sp2360:利用级联2D扩散先验实现稀疏视角下的360场景重建 | distillation 3DGS NeRF | ||
| 9 | Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models | Diffusion4D:通过视频扩散模型实现快速且时空一致的4D生成 | distillation gaussian splatting splatting | ||
| 10 | Demystify Mamba in Vision: A Linear Attention Perspective | 揭示视觉Mamba的奥秘:线性注意力视角下的深度解析与改进 | Mamba state space model linear attention | ✅ | |
| 11 | Image Deraining with Frequency-Enhanced State Space Model | 提出频率增强状态空间模型以解决图像去雨问题 | SSM state space model | ✅ | |
| 12 | ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling | 提出ID-to-3D,通过Score Distillation Sampling生成具有可控表情的身份一致3D人头模型。 | distillation |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians | Splat-SLAM:基于3D高斯优化的RGB单目全局SLAM系统 | monocular depth 3D gaussian splatting gaussian splatting | ✅ | |
| 14 | Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception | 提出Motion Perceiver,提升AI模型在生物运动感知上的泛化能力 | optical flow motion representation | ✅ | |
| 15 | CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection | 提出CRoFT框架,通过并发优化提升VL-PTM在OOD泛化和开集OOD检测中的鲁棒性。 | open-vocabulary open vocabulary | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | 3D View Optimization for Improving Image Aesthetics | 提出基于3D场景重建的视角优化方法,提升图像美学质量 | manipulation |