cs.CV(2024-08-28)

📊 共 23 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗2) 支柱八:物理动画 (Physics-based Animation) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 A Survey on Evaluation of Multimodal Large Language Models 综述多模态大语言模型评测方法,促进更可靠的通用人工智能发展 large language model multimodal
2 Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models 提出DC$^2$框架,无需训练即可提升MLLM对高分辨率图像的感知能力。 large language model multimodal
3 Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Eagle:探索混合编码器在多模态大语言模型中的设计空间 large language model multimodal
4 Does Data-Efficient Generalization Exacerbate Bias in Foundation Models? 研究表明数据高效的通用化可能加剧Foundation模型中的偏见 foundation model
5 Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data 利用主干基础模型在无人口统计数据情况下评估胸部X光片的公平性 foundation model
6 Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models 利用开放知识提升大语言模型在特定任务上的专业能力 large language model
7 SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization SITransformer:提出共享信息引导的Transformer用于极限多模态摘要生成 multimodal
8 Benchmarking foundation models as feature extractors for weakly-supervised computational pathology 通过基准测试病理学Foundation模型,用于弱监督计算病理学特征提取。 foundation model
9 More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding 提出GreenPLM,利用更多文本数据提升3D数据稀缺场景下的点云-语言理解能力 large language model
10 CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection 提出CSAD:一种无监督组件分割方法,用于提升逻辑异常检测性能。 foundation model
11 TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning TagOOD:利用视觉-语言表征和类中心学习实现新颖的分布外检测方法 multimodal
12 Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input Kangaroo:一种支持长上下文视频输入的强大视频语言模型 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
13 RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments RoboSense:用于拥挤和非结构化环境中以自我为中心的机器人感知和导航的大规模数据集与基准 scene understanding egocentric multimodal
14 Towards Realistic Example-based Modeling via 3D Gaussian Stitching 提出基于3D高斯拼接的示例建模方法,实现真实场景的无缝融合与编辑 3D gaussian splatting 3DGS gaussian splatting
15 Single-Photon 3D Imaging with Equi-Depth Photon Histograms 提出基于等深直方图的单光子3D成像技术以降低带宽需求 visual odometry PULSE TAMP
16 Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction 提出几何引导的特征学习与融合方法,提升室内场景三维重建效果 scene reconstruction
17 Ray-Distance Volume Rendering for Neural Scene Reconstruction 提出基于射线距离的体渲染方法,用于提升神经场景重建在室内场景的表现 scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
18 LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation LLaVA-MoD:通过MoE知识蒸馏实现小型高效的多模态语言模型 DPO direct preference optimization distillation
19 MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms MambaPlace:利用注意力Mamba机制的文本-点云跨模态位置识别 Mamba multimodal
20 Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation 提出DisBack,通过分布回溯加速扩散模型蒸馏的收敛速度 distillation
21 Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection 提出一种分层视觉类别建模框架,通过联合表征学习和密度估计实现Out-of-Distribution检测。 representation learning
22 Online pre-training with long-form videos 探索长视频在线预训练,提升短视频动作识别性能 contrastive learning distillation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
23 Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation 提出AL-Ref-SAM 2,利用GPT时空推理能力实现免训练的音视频参照目标分割 spatiotemporal chain-of-thought

⬅️ 返回 cs.CV 首页 · 🏠 返回主页