cs.CV(2024-05-26)

📊 共 16 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗3) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs 揭示冻结LLM多模态泛化能力:探究其内部隐式多模态对齐机制 large language model multimodal
2 M$^3$CoT: A Novel Benchmark for Multi-Domain Multi-step Multi-modal Chain-of-Thought 提出M$^3$CoT基准,用于评估多领域多步骤多模态的思维链推理能力 large language model chain-of-thought
3 Consistency-Guided Asynchronous Contrastive Tuning for Few-Shot Class-Incremental Tuning of Foundation Models 提出CoACT,用于小样本类增量式微调预训练模型,提升新类学习能力。 foundation model
4 CapS-Adapter: Caption-based MultiModal Adapter in Zero-Shot Classification 提出CapS-Adapter,利用caption构建多模态Adapter,提升零样本分类性能。 multimodal
5 Exploring a Multimodal Fusion-based Deep Learning Network for Detecting Facial Palsy 提出基于多模态融合的深度学习网络,用于面瘫的自动检测。 multimodal
6 Segmentation of Maya hieroglyphs through fine-tuned foundation models 通过微调基础模型实现玛雅象形文字的精准分割 foundation model
7 Towards Multi-Task Multi-Modal Models: A Video Generative Perspective 提出多任务多模态视频生成模型,在高保真视频合成与理解方面超越现有方法。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
8 Sp2360: Sparse-view 360 Scene Reconstruction using Cascaded 2D Diffusion Priors Sp2360:利用级联2D扩散先验实现稀疏视角下的360场景重建 distillation 3DGS NeRF
9 Diffusion4D: Fast Spatial-temporal Consistent 4D Generation via Video Diffusion Models Diffusion4D:通过视频扩散模型实现快速且时空一致的4D生成 distillation gaussian splatting splatting
10 Demystify Mamba in Vision: A Linear Attention Perspective 揭示视觉Mamba的奥秘:线性注意力视角下的深度解析与改进 Mamba state space model linear attention
11 Image Deraining with Frequency-Enhanced State Space Model 提出频率增强状态空间模型以解决图像去雨问题 SSM state space model
12 ID-to-3D: Expressive ID-guided 3D Heads via Score Distillation Sampling 提出ID-to-3D,通过Score Distillation Sampling生成具有可控表情的身份一致3D人头模型。 distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
13 Splat-SLAM: Globally Optimized RGB-only SLAM with 3D Gaussians Splat-SLAM:基于3D高斯优化的RGB单目全局SLAM系统 monocular depth 3D gaussian splatting gaussian splatting
14 Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception 提出Motion Perceiver,提升AI模型在生物运动感知上的泛化能力 optical flow motion representation
15 CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection 提出CRoFT框架,通过并发优化提升VL-PTM在OOD泛化和开集OOD检测中的鲁棒性。 open-vocabulary open vocabulary

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 3D View Optimization for Improving Image Aesthetics 提出基于3D场景重建的视角优化方法,提升图像美学质量 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页