cs.CV(2024-11-13)

📊 共 24 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models 提出MLLM4WTAL框架,利用多模态大语言模型指导弱监督时序动作定位。 large language model foundation model multimodal
2 Multimodal Object Detection using Depth and Image Data for Manufacturing Parts 提出一种基于深度和图像数据的多模态目标检测方法,用于提升制造零件识别的可靠性。 multimodal
3 Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions 提出KnowAda微调方法,提升小规模视觉语言模型在生成知识增强型图像描述时的准确性,并减少幻觉。 multimodal
4 MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval MIRe:通过无融合模态交互增强多模态查询表示,用于多模态检索 multimodal
5 Retrieval Augmented Recipe Generation 提出检索增强的大型多模态模型,解决食谱生成中的幻觉问题。 multimodal
6 LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation 提出LG-Gaze,利用几何感知连续提示学习进行语言引导的注视估计 multimodal
7 The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense 揭示VLLM安全悖论:越狱攻击与防御的双重易用性,提出LLM-Pipeline防御方法。 large language model
8 ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening 提出可复用运动先验ReMP,用于多领域3D人体姿态估计和运动插值。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
9 MBA-SLAM: Motion Blur Aware Gaussian Splatting SLAM 提出MBA-SLAM,解决运动模糊场景下的高精度SLAM问题,提升相机定位和地图重建质量。 3D gaussian splatting 3DGS gaussian splatting
10 4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization 提出不确定性感知正则化的4D高斯溅射,用于野生单目视频动态场景重建 gaussian splatting splatting scene reconstruction
11 Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model 结合3DGS与SAM模型,实现油菜高精度三维重建与生物量表型分析 3D gaussian splatting 3DGS gaussian splatting
12 BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis BBSplat:基于可学习纹理图元的 novel view synthesis 方法 3D gaussian splatting 3DGS gaussian splatting
13 Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models 提出基于傅里叶谱幅度特征提取的无监督方法,提升神经渲染图像伪造检测的准确性。 3D gaussian splatting gaussian splatting splatting
14 OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance OSMLoc:融合几何与语义引导的单图像OpenStreetMap视觉定位 depth estimation monocular depth scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
15 Multimodal Instruction Tuning with Hybrid State Space Models 提出混合Transformer-MAMBA模型,高效处理多模态长上下文输入。 Mamba state space model large language model
16 MambaXCTrack: Mamba-based Tracker with SSM Cross-correlation and Motion Prompt for Ultrasound Needle Tracking 提出MambaXCTrack以解决超声针头跟踪中的可见性问题 Mamba SSM state space model
17 EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation 提出EgoVid-5M大规模第一人称视频数据集,用于提升主观视角视频生成效果。 dreamer egocentric
18 Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment 提出CSFIQA框架,利用选择性注意力与对比学习提升盲图像质量评估性能 contrastive learning
19 Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head 提出双头知识蒸馏(DHKD),解决logits信息利用不充分及分类头坍塌问题。 distillation
20 A survey on Graph Deep Representation Learning for Facial Expression Recognition 综述:图深度表示学习在面部表情识别中的应用 representation learning

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
21 CoMiX: Cross-Modal Fusion with Deformable Convolutions for HSI-X Semantic Segmentation 提出CoMiX,利用可变形卷积进行跨模态融合,提升高光谱图像语义分割性能。 HSI multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
22 A Survey on Vision Autoregressive Model 综述视觉自回归模型,涵盖图像、视频生成及多模态统一生成等任务。 manipulation motion generation multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
23 Motion Control for Enhanced Complex Action Video Generation MVideo:提出一种基于掩码序列运动控制的复杂动作视频生成框架 motion generation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
24 MikuDance: Animating Character Art with Mixed Motion Dynamics MikuDance:融合混合运动动态的角色艺术动画生成扩散模型 motion tracking

⬅️ 返回 cs.CV 首页 · 🏠 返回主页