cs.CV（2024-10-23）

📊 共 21 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (7 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning	EntityCLIP：通过多模态注意力对比学习实现实体中心图像-文本匹配	contrastive learning large language model multimodal
2	Enhancing Multimodal Medical Image Classification using Cross-Graph Modal Contrastive Learning	提出跨图模态对比学习框架CGMCL，提升多模态医学图像分类性能。	representation learning contrastive learning multimodal
3	ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning	ADEM-VL：提出自适应嵌入融合方法，高效微调视觉-语言模型。	representation learning large language model multimodal	✅
4	MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models	MIA-DPO：多图增强直接偏好优化，提升大视觉语言模型多图理解能力	DPO direct preference optimization
5	Towards Effective Data-Free Knowledge Distillation via Diverse Diffusion Augmentation	提出基于扩散增强的数据自由知识蒸馏方法，提升合成数据多样性。	teacher-student distillation	✅
6	Rethinking Positive Pairs in Contrastive Learning	SimLAP：利用任意样本对学习视觉表征，突破对比学习对正样本对的限制	contrastive learning
7	CLEAR: Character Unlearning in Textual and Visual Modalities	提出CLEAR：一个用于文本和视觉模态中机器遗忘的开放基准测试。	DPO multimodal	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
8	PLGS: Robust Panoptic Lifting with 3D Gaussian Splatting	提出PLGS以解决3D高斯点云在噪声下的全景分割问题	3D gaussian splatting 3DGS gaussian splatting
9	VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points	VR-Splatting：结合3D高斯溅射与神经点的注视点辐射场渲染，提升VR体验	3D gaussian splatting 3DGS gaussian splatting	✅
10	Efficient Neural Implicit Representation for 3D Human Reconstruction	提出HumanAvatar，融合HuMoR、Instant-NGP和Fast-SNARF，高效重建3D人体化身。	NeRF neural radiance field implicit representation
11	OVT-B: A New Large-Scale Benchmark for Open-Vocabulary Multi-Object Tracking	构建大规模开放词汇多目标跟踪基准OVT-B，并提出融合运动特征的基线方法。	open-vocabulary open vocabulary	✅
12	Few-shot NeRF by Adaptive Rendering Loss Regularization	提出AR-NeRF，通过自适应渲染损失正则化解决少样本NeRF新视角合成问题	NeRF neural radiance field
13	Semantic Segmentation and Scene Reconstruction of RGB-D Image Frames: An End-to-End Modular Pipeline for Robotic Applications	提出端到端模块化流程，用于RGB-D图像帧的语义分割与场景重建，提升机器人应用。	scene reconstruction

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
14	AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models	提出AVHBench，用于评估音视频大语言模型中的跨模态幻觉问题	large language model multimodal	✅
15	TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts	TP-Eval：通过定制提示词挖掘多模态大语言模型在评估中的潜力	large language model multimodal
16	Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation	DDL-CXR：通过个体化胸部X光生成解决临床多模态融合中的异步性问题	multimodal
17	UnCLe: Benchmarking Unsupervised Continual Learning for Depth Completion	提出UnCLe基准，用于评估深度补全的无监督持续学习能力。	multimodal
18	ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting	提出视觉-时间上下文提示以解决开放世界交互问题	multimodal	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	CARLA2Real: a tool for reducing the sim2real appearance gap in CARLA simulator	CARLA2Real：一种降低CARLA模拟器中Sim2Real外观差异的工具	sim2real	✅
20	WorldSimBench: Towards Video Generation Models as World Simulators	提出WorldSimBench，用于评估视频生成模型作为世界模拟器的能力，涵盖具身智能场景。	manipulation predictive model

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Robust Two-View Geometry Estimation with Implicit Differentiation	提出基于隐式微分的鲁棒双视图几何估计框架，提升相机位姿估计精度。	feature matching	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页