cs.CV(2024-12-28)

📊 共 16 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (5 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
1 ST$^3$: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming 提出ST³框架,通过时空视觉令牌修剪加速多模态大语言模型推理。 large language model multimodal
2 Towards Visual Grounding: A Survey 视觉定位综述:系统梳理最新进展与挑战,促进多模态理解 multimodal visual grounding
3 Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging 探索多模态LLM在医学影像中组合泛化能力,揭示多任务训练的内在机理。 large language model multimodal
4 VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition 提出VELoRA,一种基于低秩自适应的高效RGB-Event识别方法 foundation model
5 MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion MADiff:提出MaskNet和注意力增强扩散模型,用于文本引导的时尚图像编辑。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
6 Multi-Modality Driven LoRA for Adverse Condition Depth Estimation 提出MMD-LoRA,通过多模态驱动的LoRA方法提升恶劣天气下的深度估计性能。 contrastive learning depth estimation multimodal
7 MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing 提出MambaVO以解决深度视觉里程计中的匹配模糊问题 Mamba visual odometry
8 DepthMamba with Adaptive Fusion 提出基于Mamba和自适应融合的DepthMamba,提升噪声位姿下的多视角深度估计鲁棒性。 Mamba depth estimation
9 Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems 提出TCVADS,解决弱监督视频异常检测中效率、精度和可解释性难题,适用于智慧城市监控。 contrastive learning distillation multimodal
10 STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection 提出STNMamba,基于Mamba的空间-时间常态学习用于视频异常检测 Mamba

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
11 GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting GSplatLoc:利用3D高斯溅射实现超精确相机定位 3D gaussian splatting gaussian splatting splatting
12 DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis 提出DEGSTalk,一种基于3D高斯场的头发保持型说话人脸合成方法 3D gaussian splatting 3DGS gaussian splatting
13 Enhancing Marine Debris Acoustic Monitoring by Optical Flow-Based Motion Vector Analysis 提出基于光流运动矢量分析的水下声学相机海洋垃圾监测方法 optical flow

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
14 SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis 提出SyncDiff,通过同步运动扩散解决多体人-物交互动作合成问题 motion diffusion motion synthesis human-object interaction

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
15 Cross-Modal Mapping: Mitigating the Modality Gap for Few-Shot Image Classification 提出跨模态映射CMM方法,缓解少样本图像分类中的模态差异问题 spatial relationship

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
16 DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments DAVE:高危道路使用者数据集,提升复杂环境下视觉感知算法的鲁棒性 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页