cs.CV(2024-11-25)

📊 共 18 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱一:机器人控制 (Robot Control) (3) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image 提出MOSABench,用于评估多模态大语言模型在多目标情感分析中的图像理解能力。 large language model multimodal
2 Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering 提出ReflectiVA,通过自反思tokens增强多模态LLM的知识型视觉问答能力 large language model multimodal
3 Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models Chat2SVG:结合大语言模型与图像扩散模型的矢量图形生成框架 large language model
4 ENCLIP: Ensembling and Clustering-Based Contrastive Language-Image Pretraining for Fashion Multimodal Search with Limited Data and Low-Quality Images 提出ENCLIP,通过集成和聚类提升CLIP在有限数据和低质量图像下的时尚多模态搜索性能。 multimodal
5 Debiasing Classifiers by Amplifying Bias with Latent Diffusion and Large Language Models 提出DiffuBias,利用潜在扩散模型和大型语言模型增强分类器鲁棒性,解决偏见学习问题。 large language model
6 Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge 提出实体增强认知对齐(EECA)方法,解决LVLM中视觉知识与语言模型认知框架的对齐问题。 large language model multimodal
7 Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation MMSLT:利用多模态大语言模型实现无词汇手语翻译 large language model multimodal
8 LaB-RAG: Label Boosted Retrieval Augmented Generation for Radiology Report Generation 提出LaB-RAG,利用标签增强检索增强生成,提升放射报告生成效果。 large language model
9 All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages 提出ALM-bench,用于评估LMMs在100种文化多样性语言上的理解和推理能力。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
10 SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving 提出SplatAD以解决自动驾驶中实时激光雷达与相机渲染问题 3D gaussian splatting 3DGS gaussian splatting
11 UnitedVLN: Generalizable Gaussian Splatting for Continuous Vision-Language Navigation 提出UnitedVLN,基于可泛化高斯溅射实现连续视觉-语言导航 3DGS gaussian splatting splatting
12 A Review of Bayesian Uncertainty Quantification in Deep Probabilistic Image Segmentation 综述深度概率图像分割中贝叶斯不确定性量化方法,促进可靠决策。 scene understanding

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
13 RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics RoboSpatial:用于机器人2D/3D视觉-语言模型空间理解的教学数据集 manipulation affordance egocentric
14 Curvature Informed Furthest Point Sampling 提出曲率引导的强化学习FPS采样算法,提升点云处理任务性能 manipulation reinforcement learning
15 Lens Distortion Encoding System Version 1.0 提出镜头畸变编码系统LDES,实现高质量运动图像的无缝镜头畸变校正与转换。 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
16 UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing 提出UniPose框架以解决人类姿态理解与生成的多模态控制问题 SMPL large language model multimodal
17 Multi-Resolution Generative Modeling of Human Motion from Limited Data 提出一种多分辨率生成模型,用于从有限数据中合成逼真的人体运动。 SMPL

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
18 DreamRunner: Fine-Grained Compositional Story-to-Video Generation with Retrieval-Augmented Motion Adaptation DreamRunner:提出检索增强运动适配的细粒度组合故事到视频生成方法 motion synthesis motion adaptation large language model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页