cs.CV(2024-10-23)

📊 共 21 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (7 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
1 EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning EntityCLIP:通过多模态注意力对比学习实现实体中心图像-文本匹配 contrastive learning large language model multimodal
2 Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning 提出跨图模态对比学习框架CGMCL,提升多模态医学图像分类性能。 representation learning contrastive learning multimodal
3 ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning ADEM-VL:提出自适应嵌入融合方法,高效微调视觉-语言模型。 representation learning large language model multimodal
4 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models MIA-DPO:多图增强直接偏好优化,提升大视觉语言模型多图理解能力 DPO direct preference optimization
5 Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation 提出基于扩散增强的数据自由知识蒸馏方法,提升合成数据多样性。 teacher-student distillation
6 Rethinking Positive Pairs in Contrastive Learning SimLAP:利用任意样本对学习视觉表征,突破对比学习对正样本对的限制 contrastive learning
7 CLEAR: Character Unlearning in Textual and Visual Modalities 提出CLEAR:一个用于文本和视觉模态中机器遗忘的开放基准测试。 DPO multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
8 PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting 提出PLGS以解决3D高斯点云在噪声下的全景分割问题 3D gaussian splatting 3DGS gaussian splatting
9 VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points VR-Splatting:结合3D高斯溅射与神经点的注视点辐射场渲染,提升VR体验 3D gaussian splatting 3DGS gaussian splatting
10 Efficient Neural Implicit Representation for 3D Human Reconstruction 提出HumanAvatar,融合HuMoR、Instant-NGP和Fast-SNARF,高效重建3D人体化身。 NeRF neural radiance field implicit representation
11 OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking 构建大规模开放词汇多目标跟踪基准OVT-B,并提出融合运动特征的基线方法。 open-vocabulary open vocabulary
12 Few-shot NeRF by Adaptive Rendering Loss Regularization 提出AR-NeRF,通过自适应渲染损失正则化解决少样本NeRF新视角合成问题 NeRF neural radiance field
13 Semantic Segmentation and Scene Reconstruction of RGB-D Image Frames: An End-to-End Modular Pipeline for Robotic Applications 提出端到端模块化流程,用于RGB-D图像帧的语义分割与场景重建,提升机器人应用。 scene reconstruction

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
14 AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models 提出AVHBench,用于评估音视频大语言模型中的跨模态幻觉问题 large language model multimodal
15 TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts TP-Eval:通过定制提示词挖掘多模态大语言模型在评估中的潜力 large language model multimodal
16 Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation DDL-CXR:通过个体化胸部X光生成解决临床多模态融合中的异步性问题 multimodal
17 UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion 提出UnCLe基准,用于评估深度补全的无监督持续学习能力。 multimodal
18 ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting 提出视觉-时间上下文提示以解决开放世界交互问题 multimodal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
19 CARLA2Real: a tool for reducing the sim2real appearance gap in CARLA simulator CARLA2Real:一种降低CARLA模拟器中Sim2Real外观差异的工具 sim2real
20 WorldSimBench: Towards Video Generation Models as World Simulators 提出WorldSimBench,用于评估视频生成模型作为世界模拟器的能力,涵盖具身智能场景。 manipulation predictive model

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
21 Robust Two-View Geometry Estimation with Implicit Differentiation 提出基于隐式微分的鲁棒双视图几何估计框架,提升相机位姿估计精度。 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页