cs.CV(2024-10-31)

📊 共 29 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12) 支柱三:空间感知与语义 (Perception & Semantics) (8 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱四:生成式动作 (Generative Motion) (2 🔗2) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding 提出参数高效微调的医学多模态大语言模型PFMVG,用于医学视觉定位 large language model multimodal visual grounding
2 Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment 利用人机协同数据增强评估细胞AI基础模型在肾脏病理分割中的性能 foundation model
3 Nearest Neighbor Normalization Improves Multimodal Retrieval 提出近邻归一化(NNN)方法,无需额外训练即可提升多模态检索性能。 multimodal
4 Handwriting Recognition in Historical Documents with Multimodal LLM 利用多模态LLM解决历史手写文档识别难题,探索Gemini模型的潜力。 multimodal
5 FRoundation: Are Foundation Models Ready for Face Recognition? 探索基础模型在人脸识别中的潜力,并提出适应性微调策略。 foundation model
6 Using Multimodal Deep Neural Networks to Disentangle Language from Visual Aesthetics 利用多模态深度神经网络分离语言与视觉美学 multimodal
7 Phrase Decoupling Cross-Modal Hierarchical Matching and Progressive Position Correction for Visual Grounding 提出短语解耦跨模态分层匹配与渐进位置校正的视觉定位方法 visual grounding
8 Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach 提出一种LLM驱动的数据方法,用于Web规模视觉实体识别。 large language model multimodal
9 ResiDual Transformer Alignment with Spectral Decomposition 提出ResiDual,通过谱分解对Transformer残差流进行对齐,提升零样本分类性能。 multimodal
10 Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts Stereo-Talker:提出基于先验引导的混合专家模型,实现高质量音频驱动的3D人体视频合成。 large language model
11 Modality and Task Adaptation for Enhanced Zero-shot Composed Image Retrieval 提出MoTa-Adapter,解决零样本组合图像检索中的模态和任务差异问题。 large language model
12 Posture-Informed Muscular Force Learning for Robust Hand Pressure Estimation PiMForce:利用姿态信息增强肌肉力量学习,实现鲁棒的手部压力估计 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
13 GaussianMarker: Uncertainty-Aware Copyright Protection of 3D Gaussian Splatting 提出GaussianMarker,实现3D高斯溅射模型的版权保护与隐形水印嵌入。 3D gaussian splatting 3DGS gaussian splatting
14 ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D Images ImOV3D:仅用2D图像学习开放词汇3D点云目标检测 depth estimation monocular depth open-vocabulary
15 Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis 提出自集成高斯溅射(SE-GS),解决少样本新视角合成中的过拟合问题。 3D gaussian splatting 3DGS gaussian splatting
16 GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering GeoSplatting:通过几何引导的高斯溅射实现基于物理的逆渲染 3D gaussian splatting 3DGS gaussian splatting
17 Optical Lens Attack on Monocular Depth Estimation for Autonomous Driving 提出LensAttack以解决单目深度估计的安全隐患 depth estimation monocular depth
18 Aquatic-GS: A Hybrid 3D Representation for Underwater Scenes 提出Aquatic-GS水下混合3D表示方法,有效建模水体和物体,实现高质量渲染与复原。 3D gaussian splatting 3DGS gaussian splatting
19 GS-Blur: A 3D Scene-Based Dataset for Realistic Image Deblurring 提出GS-Blur:基于3D高斯溅射的真实图像去模糊数据集 3D gaussian splatting 3DGS gaussian splatting
20 XRDSLAM: A Flexible and Modular Framework for Deep Learning based SLAM XRDSLAM:一个灵活且模块化的深度学习SLAM框架,易于扩展和评估。 3DGS NeRF

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
21 JEMA: A Joint Embedding Framework for Scalable Co-Learning with Multimodal Alignment JEMA:一种用于多模态对齐可扩展协同学习的联合嵌入框架 contrastive learning multimodal
22 MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation 提出MLLA-UNet,结合线性注意力与Mamba机制,高效解决医学图像分割难题。 Mamba linear attention
23 NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs NIMBA:利用SSM实现点云鲁棒且有原则的处理 Mamba SSM state space model
24 Semantic Knowledge Distillation for Onboard Satellite Earth Observation Image Classification 提出动态加权知识蒸馏框架,用于资源受限的卫星遥感图像高效分类。 distillation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
25 Fashion-VDM: Video Diffusion Model for Virtual Try-On Fashion-VDM:用于虚拟试穿视频生成的视频扩散模型 classifier-free guidance
26 Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning DEMO:通过解耦编码与条件化增强文本到视频生成中的运动效果 motion synthesis

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
27 EZ-HOI: VLM Adaptation via Guided Prompt Learning for Zero-Shot HOI Detection 提出EZ-HOI,通过引导式Prompt学习实现零样本HOI检测中的VLM自适应 human-object interaction HOI large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
28 Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization 提出HiFi-Net++,利用语言引导的分层细粒度方法解决图像伪造检测与定位问题 manipulation representation learning

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
29 DELTA: Dense Efficient Long-range 3D Tracking for any video DELTA:一种高效的密集长程3D跟踪方法,适用于任意视频。 motion tracking

⬅️ 返回 cs.CV 首页 · 🏠 返回主页