cs.CV(2024-09-20)

📊 共 29 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱一:机器人控制 (Robot Control) (3) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation 提出SSE框架,通过语义选择和增强解决工业级数据同化中的数据过载问题 foundation model multimodal
2 Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model 提出MGLMM,通过指令引导实现多粒度分割和描述,解决现有LMMs在细粒度理解和分割上的局限性。 large language model multimodal
3 Validation & Exploration of Multimodal Deep-Learning Camera-Lidar Calibration models 研究多模态深度学习模型,实现相机-激光雷达的动态标定 multimodal
4 MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension MaPPER:多模态先验引导的参数高效微调方法,用于指代表达式理解 multimodal
5 Towards Child-Inclusive Clinical Video Understanding for Autism Spectrum Disorder 提出基于多模态融合的自闭症儿童临床视频理解方法 large language model foundation model multimodal
6 Portrait Video Editing Empowered by Multimodal Generative Priors PortraitGen:基于多模态生成先验的人像视频编辑方法,实现一致且富有表现力的风格化。 multimodal
7 Physics-Informed Latent Diffusion for Multimodal Brain MRI Synthesis 提出物理信息潜在扩散模型,用于多模态脑部MRI合成,解决模态缺失问题。 multimodal
8 AVG-LLaVA: An Efficient Large Multimodal Model with Adaptive Visual Granularity AVG-LLaVA:提出一种自适应视觉粒度的高效大型多模态模型 multimodal
9 A Novel Adaptive Fine-Tuning Algorithm for Multimodal Models: Self-Optimizing Classification and Selection of High-Quality Datasets in Remote Sensing 提出自适应微调算法,用于遥感多模态模型的高质量数据集选择与优化。 multimodal
10 FullAnno: A Data Engine for Enhancing Image Comprehension of MLLMs FullAnno:用于增强MLLM图像理解能力的数据引擎 large language model multimodal
11 TalkMosaic: Interactive PhotoMosaic with Multi-modal LLM Q&A Interactions 提出TalkMosaic,通过多模态LLM问答交互实现交互式照片马赛克 multimodal
12 Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval 提出一种高效的通用图像检索特征提取框架,解决领域泛化性问题。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
13 HMD^2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device HMD^2:利用单目头戴设备进行环境感知全身动作生成 visual SLAM motion diffusion model motion diffusion
14 Elite-EvGS: Learning Event-based 3D Gaussian Splatting by Distilling Event-to-Video Priors Elite-EvGS:通过事件到视频先验知识蒸馏学习基于事件的3D高斯溅射 3D gaussian splatting 3DGS gaussian splatting
15 3D-GSW: 3D Gaussian Splatting for Robust Watermarking 提出3D-GSW,用于3D高斯溅射模型的鲁棒水印技术,保护模型和渲染图像的版权。 3D gaussian splatting gaussian splatting splatting
16 V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians V^3:通过可流式2D动态高斯实现移动端高质量体积视频渲染 3DGS
17 DAP-LED: Learning Degradation-Aware Priors with CLIP for Joint Low-light Enhancement and Deblurring 提出DAP-LED,利用CLIP学习退化先验,联合解决弱光增强和去模糊问题 depth estimation
18 CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction CVT-Occ:利用时序代价体融合提升3D Occupancy预测精度 depth estimation

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
19 OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping OneBEV:利用单张全景图像实现鸟瞰图语义地图构建 Mamba semantic mapping semantic map
20 RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning RingMo-Aerial:提出基于仿射变换对比学习的遥感图像通用模型 contrastive learning foundation model
21 Brain-Cognition Fingerprinting via Graph-GCCA with Contrastive Learning 提出基于对比学习的图广义典型相关分析CoGraCa,用于脑认知指纹图谱构建。 contrastive learning multimodal
22 LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion 提出Log Conformal Maps (LCM),以鲁棒地学习表征,缓解透视失真问题。 representation learning spatial relationship
23 ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer ViTGuard:提出一种基于注意力机制的对抗样本检测方法,用于防御Vision Transformer的攻击。 masked autoencoder MAE spatial relationship
24 Interpret the Predictions of Deep Networks via Re-Label Distillation 提出重标记蒸馏方法,用于解释深度网络预测结果 distillation

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
25 Manipulation Facing Threats: Evaluating Physical Vulnerabilities in End-to-End Vision Language Action Models 提出PVEP评估流程,分析VLAMs在物理威胁下的鲁棒性 manipulation open-vocabulary open vocabulary
26 T2M-X: Learning Expressive Text-to-Motion Generation from Partially Annotated Data T2M-X:从部分标注数据学习富有表现力的文本到动作生成 humanoid text-to-motion motion generation
27 ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification 提出ID-Guard框架,通过破坏身份信息对抗人脸伪造,保护个人隐私。 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
28 YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models 提出YesBut数据集,用于评估视觉-语言模型对讽刺图像的理解能力 HuMoR multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
29 High-Fidelity Mask-free Neural Surface Reconstruction for Virtual Reality Hi-NeuS:无需掩膜的高保真神经表面重建,用于虚拟现实 geometric consistency

⬅️ 返回 cs.CV 首页 · 🏠 返回主页