cs.CV(2024-12-28)
📊 共 16 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (5)
支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | ST$^3$: Accelerating Multimodal Large Language Model by Spatial-Temporal Visual Token Trimming | 提出ST³框架,通过时空视觉令牌修剪加速多模态大语言模型推理。 | large language model multimodal | ||
| 2 | Towards Visual Grounding: A Survey | 视觉定位综述:系统梳理最新进展与挑战,促进多模态理解 | multimodal visual grounding | ✅ | |
| 3 | Exploring Compositional Generalization of Multimodal LLMs for Medical Imaging | 探索多模态LLM在医学影像中组合泛化能力,揭示多任务训练的内在机理。 | large language model multimodal | ✅ | |
| 4 | VELoRA: A Low-Rank Adaptation Approach for Efficient RGB-Event based Recognition | 提出VELoRA,一种基于低秩自适应的高效RGB-Event识别方法 | foundation model | ✅ | |
| 5 | MADiff: Text-Guided Fashion Image Editing with Mask Prediction and Attention-Enhanced Diffusion | MADiff:提出MaskNet和注意力增强扩散模型,用于文本引导的时尚图像编辑。 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | Multi-Modality Driven LoRA for Adverse Condition Depth Estimation | 提出MMD-LoRA,通过多模态驱动的LoRA方法提升恶劣天气下的深度估计性能。 | contrastive learning depth estimation multimodal | ||
| 7 | MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing | 提出MambaVO以解决深度视觉里程计中的匹配模糊问题 | Mamba visual odometry | ||
| 8 | DepthMamba with Adaptive Fusion | 提出基于Mamba和自适应融合的DepthMamba,提升噪声位姿下的多视角深度估计鲁棒性。 | Mamba depth estimation | ||
| 9 | Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems | 提出TCVADS,解决弱监督视频异常检测中效率、精度和可解释性难题,适用于智慧城市监控。 | contrastive learning distillation multimodal | ||
| 10 | STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection | 提出STNMamba,基于Mamba的空间-时间常态学习用于视频异常检测 | Mamba |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | GSplatLoc: Ultra-Precise Camera Localization via 3D Gaussian Splatting | GSplatLoc:利用3D高斯溅射实现超精确相机定位 | 3D gaussian splatting gaussian splatting splatting | ||
| 12 | DEGSTalk: Decomposed Per-Embedding Gaussian Fields for Hair-Preserving Talking Face Synthesis | 提出DEGSTalk,一种基于3D高斯场的头发保持型说话人脸合成方法 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 13 | Enhancing Marine Debris Acoustic Monitoring by Optical Flow-Based Motion Vector Analysis | 提出基于光流运动矢量分析的水下声学相机海洋垃圾监测方法 | optical flow |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis | 提出SyncDiff,通过同步运动扩散解决多体人-物交互动作合成问题 | motion diffusion motion synthesis human-object interaction |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Cross-Modal Mapping: Mitigating the Modality Gap for Few-Shot Image Classification | 提出跨模态映射CMM方法,缓解少样本图像分类中的模态差异问题 | spatial relationship |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | DAVE: Diverse Atomic Visual Elements Dataset with High Representation of Vulnerable Road Users in Complex and Unpredictable Environments | DAVE:高危道路使用者数据集,提升复杂环境下视觉感知算法的鲁棒性 | spatiotemporal |