cs.CV(2024-06-20)

📊 共 16 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 Using Multimodal Foundation Models and Clustering for Improved Style Ambiguity Loss 提出基于多模态基础模型和聚类的风格歧义损失,提升文本到图像生成模型的创造性。 foundation model multimodal
2 The Use of Multimodal Large Language Models to Detect Objects from Thermal Images: Transportation Applications 利用多模态大语言模型从热成像中检测物体,应用于智能交通系统 large language model multimodal
3 A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models 综述:基于文本到图像扩散模型的多模态引导图像编辑技术 multimodal
4 HeartBeat: Towards Controllable Echocardiography Video Synthesis with Multimodal Conditions-Guided Diffusion Models HeartBeat:多模态条件引导的扩散模型,实现可控超声心动图视频合成 multimodal
5 Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs Prism:解耦并评估视觉语言模型能力的框架,提升性能并降低成本 large language model multimodal
6 E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion 提出E-ANT大规模中文GUI导航数据集,促进多模态大模型在移动设备上的应用 large language model multimodal
7 Towards Event-oriented Long Video Understanding 提出Event-Bench基准测试和VIM方法,提升MLLM在事件导向长视频理解能力 large language model multimodal
8 From Descriptive Richness to Bias: Unveiling the Dark Side of Generative Image Caption Enrichment 揭示生成式图像描述增强的负面影响:偏见与幻觉问题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
9 Deblurring Neural Radiance Fields with Event-driven Bundle Adjustment 提出EBAD-NeRF,利用事件相机数据解决NeRF在运动模糊场景下的重建问题 representation learning NeRF neural radiance field
10 Apprenticeship-Inspired Elegance: Synergistic Knowledge Distillation Empowers Spiking Neural Networks for Efficient Single-Eye Emotion Recognition 提出基于知识蒸馏的协同学习框架,提升SNN单眼情感识别效率 distillation multimodal
11 Regularized Distribution Matching Distillation for One-step Unpaired Image-to-Image Translation 提出正则化分布匹配蒸馏以解决无配对图像翻译问题 distillation
12 Seg-LSTM: Performance of xLSTM for Semantic Segmentation of Remotely Sensed Images Seg-LSTM:评估xLSTM在遥感图像语义分割中的性能,并分析其局限性 Mamba large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
13 Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation 提出人-机对比对齐方法,缓解机器人视觉预训练中的领域差异 manipulation visual pre-training language conditioned
14 Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps 提出可逆一致性蒸馏(iCD),实现仅需约7步的文本引导图像编辑。 manipulation distillation classifier-free guidance

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
15 CityNav: A Large-Scale Dataset for Real-World Aerial Navigation CityNav:用于真实世界空中导航的大规模数据集 semantic map spatial relationship VLN

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
16 VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought ICAL:VLM智能体通过自反思生成高质量经验,提升具身智能任务性能。 Ego4D instruction following

⬅️ 返回 cs.CV 首页 · 🏠 返回主页