cs.CV(2025-01-06)

📊 共 26 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (14 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (14 篇)

#题目一句话要点标签🔗
1 EAGLE: Enhanced Visual Grounding Minimizes Hallucinations in Instructional Multimodal Models EAGLE:增强视觉基础能力,最小化指令型多模态模型中的幻觉问题 large language model multimodal visual grounding
2 Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild 提出Socratic Questioning框架,提升多模态LLM在复杂视觉推理中的性能。 large language model multimodal chain-of-thought
3 CM3T: Framework for Efficient Multimodal Learning for Inhomogeneous Interaction Datasets CM3T:一种高效的多模态学习框架,用于异构交互数据集。 multimodal
4 MObI: Multimodal Object Inpainting Using Diffusion Models MObI:提出基于扩散模型的多模态物体填充框架,用于自动驾驶场景数据增强。 multimodal
5 MVP: Multimodal Emotion Recognition based on Video and Physiological Signals 提出MVP模型,融合视频与生理信号,提升长时序情感识别性能 multimodal
6 FoundPAD: Foundation Models Reloaded for Face Presentation Attack Detection FoundPAD:利用重载的基础模型进行人脸呈现攻击检测 foundation model
7 Large Language Models for Video Surveillance Applications 提出基于视觉语言模型的视频监控摘要生成方法,提升分析精度和效率。 large language model
8 Visual Large Language Models for Generalized and Specialized Applications 综述视觉大语言模型在通用和专用场景下的应用,并探讨其挑战与未来方向 large language model
9 Ultrasound-QBench: Can LLMs Aid in Quality Assessment of Ultrasound Imaging? Ultrasound-QBench:利用多模态大语言模型辅助超声图像质量评估 large language model multimodal
10 SAM-EM: Real-Time Segmentation for Automated Liquid Phase Transmission Electron Microscopy SAM-EM:用于自动化液相透射电子显微镜的实时分割方法 foundation model
11 CAT: Content-Adaptive Image Tokenization 提出内容自适应图像Token化方法CAT,提升图像重建和生成效果。 large language model
12 SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild SceneVTG++:提出可控多语言场景视觉文本生成方法,解决自然场景图像文本生成难题 large language model
13 MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs 提出MDP3以解决视频大语言模型中的帧选择问题 large language model
14 Found in Translation: semantic approaches for enhancing AI interpretability in face verification 提出基于语义概念的XAI框架,提升人脸验证模型的可解释性 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
15 Gaussian Masked Autoencoders 提出高斯掩码自编码器(GMAE),联合学习语义抽象和空间理解,实现零样本空间理解能力。 representation learning masked autoencoder MAE
16 AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scene AE-NeRF:增强事件相机NeRF在非理想条件和更大场景下的重建能力 distillation NeRF neural radiance field
17 Human Gaze Boosts Object-Centered Representation Learning 提出基于人类注视机制的物体中心表征学习方法,提升自监督学习性能 representation learning egocentric Ego4D
18 First-place Solution for Streetscape Shop Sign Recognition Competition 提出多阶段融合框架,解决复杂街景店招文字识别难题 reinforcement learning multimodal
19 CCStereo: Audio-Visual Contextual and Contrastive Learning for Binaural Audio Generation 提出基于音视频上下文对比学习的双耳音频生成模型,提升空间细节表现。 contrastive learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
20 Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs 提出基于优化特征平面和标准视频编解码器的3D高斯溅射压缩方法 3D gaussian splatting gaussian splatting splatting
21 Spiking monocular event based 6D pose estimation for space application 提出基于脉冲神经网络的单目事件相机6D位姿估计方法,用于空间应用 6D pose estimation
22 Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis PointmapDiff:利用点云图条件扩散模型实现一致性新视角合成 3D gaussian splatting gaussian splatting splatting
23 ProTracker: Probabilistic Integration for Robust and Accurate Point Tracking ProTracker:融合概率的鲁棒精确视频点跟踪框架 optical flow

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
24 HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos HaWoR:提出一种从第一视角视频重建世界坐标系下手部运动的高保真方法 egocentric hand reconstruction
25 WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation WorldPose:提出世界杯多视角3D人体姿态估计数据集,挑战现有算法。 SMPL

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
26 HOGSA: Bimanual Hand-Object Interaction Understanding with 3D Gaussian Splatting Based Data Augmentation 提出基于3D高斯溅射的数据增强框架,用于提升双手动目标交互理解 bi-manual 3D gaussian splatting 3DGS

⬅️ 返回 cs.CV 首页 · 🏠 返回主页