cs.CV(2024-11-03)

📊 共 8 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (2) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
1 Optical Flow Representation Alignment Mamba Diffusion Model for Medical Video Generation MedSora:结合光流对齐Mamba扩散模型,用于高质量医学视频生成 Mamba optical flow
2 DreamPolish: Domain Score Distillation With Progressive Geometry Generation DreamPolish:结合领域分数蒸馏与渐进几何生成的文本到3D模型生成方法 distillation classifier-free guidance
3 VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization VQ-Map:利用向量量化在离散空间中进行鸟瞰图布局估计,刷新多项记录。 representation learning semantic map VQ-VAE

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
4 EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark 提出EEE-Bench,用于评估LMMs在电气电子工程问题上的能力,揭示其在复杂视觉信息处理上的局限性。 large language model foundation model multimodal
5 Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation 提出负样本挖掘的Mosaic数据增强NeMo,提升指代图像分割在复杂场景下的性能 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
6 Exploring PCA-based feature representations of image pixels via CNN to enhance food image segmentation 提出基于PCA的CNN特征表示方法,用于提升食物图像分割效果 open-vocabulary open vocabulary
7 Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli 基于运动能量处理,实现对随机点刺激的类人零样本目标分割 optical flow

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
8 FactorizePhys: Matrix Factorization for Multidimensional Attention in Remote Physiological Sensing 提出FactorizePhys,利用矩阵分解实现rPPG中多维注意力机制,提升信号提取性能。 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页