cs.CV（2024-08-28）

📊 共 23 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (12 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗2) 支柱八：物理动画 (Physics-based Animation) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Survey on Evaluation of Multimodal Large Language Models	综述多模态大语言模型评测方法，促进更可靠的通用人工智能发展	large language model multimodal
2	Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Large Language Models	提出DC$^2$框架，无需训练即可提升MLLM对高分辨率图像的感知能力。	large language model multimodal
3	Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders	Eagle：探索混合编码器在多模态大语言模型中的设计空间	large language model multimodal
4	Does Data-Efficient Generalization Exacerbate Bias in Foundation Models?	研究表明数据高效的通用化可能加剧Foundation模型中的偏见	foundation model
5	Using Backbone Foundation Model for Evaluating Fairness in Chest Radiography Without Demographic Data	利用主干基础模型在无人口统计数据情况下评估胸部X光片的公平性	foundation model
6	Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models	利用开放知识提升大语言模型在特定任务上的专业能力	large language model	✅
7	SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization	SITransformer：提出共享信息引导的Transformer用于极限多模态摘要生成	multimodal	✅
8	Benchmarking foundation models as feature extractors for weakly-supervised computational pathology	通过基准测试病理学Foundation模型，用于弱监督计算病理学特征提取。	foundation model
9	More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding	提出GreenPLM，利用更多文本数据提升3D数据稀缺场景下的点云-语言理解能力	large language model	✅
10	CSAD: Unsupervised Component Segmentation for Logical Anomaly Detection	提出CSAD：一种无监督组件分割方法，用于提升逻辑异常检测性能。	foundation model
11	TagOOD: A Novel Approach to Out-of-Distribution Detection via Vision-Language Representations and Class Center Learning	TagOOD：利用视觉-语言表征和类中心学习实现新颖的分布外检测方法	multimodal
12	Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input	Kangaroo：一种支持长上下文视频输入的强大视频语言模型	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
13	RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments	RoboSense：用于拥挤和非结构化环境中以自我为中心的机器人感知和导航的大规模数据集与基准	scene understanding egocentric multimodal
14	Towards Realistic Example-based Modeling via 3D Gaussian Stitching	提出基于3D高斯拼接的示例建模方法，实现真实场景的无缝融合与编辑	3D gaussian splatting 3DGS gaussian splatting	✅
15	Single-Photon 3D Imaging with Equi-Depth Photon Histograms	提出基于等深直方图的单光子3D成像技术以降低带宽需求	visual odometry PULSE TAMP
16	Geometry-guided Feature Learning and Fusion for Indoor Scene Reconstruction	提出几何引导的特征学习与融合方法，提升室内场景三维重建效果	scene reconstruction
17	Ray-Distance Volume Rendering for Neural Scene Reconstruction	提出基于射线距离的体渲染方法，用于提升神经场景重建在室内场景的表现	scene reconstruction

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
18	LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	LLaVA-MoD：通过MoE知识蒸馏实现小型高效的多模态语言模型	DPO direct preference optimization distillation	✅
19	MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms	MambaPlace：利用注意力Mamba机制的文本-点云跨模态位置识别	Mamba multimodal
20	Distribution Backtracking Builds A Faster Convergence Trajectory for Diffusion Distillation	提出DisBack，通过分布回溯加速扩散模型蒸馏的收敛速度	distillation	✅
21	Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection	提出一种分层视觉类别建模框架，通过联合表征学习和密度估计实现Out-of-Distribution检测。	representation learning
22	Online pre-training with long-form videos	探索长视频在线预训练，提升短视频动作识别性能	contrastive learning distillation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation	提出AL-Ref-SAM 2，利用GPT时空推理能力实现免训练的音视频参照目标分割	spatiotemporal chain-of-thought	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页