cs.CV(2024-09-25)

📊 共 27 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (14 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (8) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (14 篇)

#题目一句话要点标签🔗
1 EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models EAGLE:高效理解任意指示性视觉提示的多模态大语言模型 large language model multimodal instruction following
2 Unveiling Ontological Commitment in Multi-Modal Foundation Models 提出一种从多模态模型中提取概念层级关系的方法,用于验证和校准模型。 foundation model multimodal
3 First Place Solution to the ECCV 2024 BRAVO Challenge: Evaluating Robustness of Vision Foundation Models for Semantic Segmentation 利用DINOv2视觉基础模型,结合简单分割解码器,提升语义分割的鲁棒性 foundation model
4 Block Expanded DINORET: Adapting Natural Domain Foundation Models for Retinal Imaging Without Catastrophic Forgetting 提出Block Expanded DINORET,解决自然域预训练模型在视网膜成像迁移中的灾难性遗忘问题 foundation model
5 Targeted Neural Architectures in Multi-Objective Frameworks for Complete Glioma Characterization from Multimodal MRI 针对多模态MRI的神经架构,用于完整神经胶质瘤表征的多目标框架 multimodal
6 ControlCity: A Multimodal Diffusion Model Based Approach for Accurate Geospatial Data Generation and Urban Morphology Analysis ControlCity:基于多模态扩散模型生成精确地理空间数据并分析城市形态 multimodal
7 Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms 提出基于DINOv2和交叉注意力的鲁棒场景变更检测方法 foundation model
8 MaViLS, a Benchmark Dataset for Video-to-Slide Alignment, Assessing Baseline Accuracy with a Multimodal Alignment Algorithm Leveraging Speech, OCR, and Visual Features MaViLS:用于视频-幻灯片对齐的基准数据集与多模态对齐算法 multimodal
9 Pix2Next: Leveraging Vision Foundation Models for RGB to NIR Image Translation Pix2Next:利用视觉基础模型实现RGB到近红外图像的转换 foundation model
10 Underwater Camouflaged Object Tracking Meets Vision-Language SAM2 提出首个大规模水下伪装目标跟踪多模态数据集UW-COT220,并提出基于SAM2的视觉-语言跟踪框架VL-SAM2。 foundation model multimodal
11 ChatCam: Empowering Camera Control through Conversational AI ChatCam:通过对话式AI赋能相机控制,模拟专业电影摄影师工作流 large language model
12 Navigating the Nuances: A Fine-grained Evaluation of Vision-Language Navigation 提出基于上下文无关文法的视觉-语言导航细粒度评估框架 VLN
13 Attention Prompting on Image for Large Vision-Language Models 提出图像注意力提示方法,提升大视觉语言模型对文本指令的遵循能力 large language model
14 DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling DALDA:利用扩散模型和LLM进行数据增强,自适应调整引导缩放以提升少样本学习性能 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
15 SeaSplat: Representing Underwater Scenes with 3D Gaussian Splatting and a Physically Grounded Image Formation Model SeaSplat:利用3D高斯溅射和物理成像模型实现水下场景实时渲染 3D gaussian splatting 3DGS gaussian splatting
16 Optical Lens Attack on Deep Learning Based Monocular Depth Estimation 提出LensAttack:利用光学透镜干扰单目深度估计的物理攻击方法 depth estimation monocular depth
17 Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model 提出基于多视角扩散模型的高斯溅射对象插入方法,实现高质量三维场景重建。 gaussian splatting splatting
18 3DDX: Bone Surface Reconstruction from a Single Standard-Geometry Radiograph via Dual-Face Depth Estimation 提出3DDX,通过双面深度估计从单张标准X光片重建骨骼表面 depth estimation penetration
19 Disco4D: Disentangled 4D Human Generation and Animation from a Single Image Disco4D:提出解耦的4D人体生成与动画框架,从单张图像生成逼真动态人体。 gaussian splatting splatting SMPL
20 Parameter-efficient Bayesian Neural Networks for Uncertainty-aware Depth Estimation 提出参数高效贝叶斯神经网络,用于不确定性感知的深度估计 depth estimation monocular depth
21 EventHDR: from Event to High-Speed HDR Videos and Beyond EventHDR:提出基于事件相机的高速HDR视频重建方法,并构建真实数据集。 depth estimation monocular depth optical flow
22 Pose-Guided Fine-Grained Sign Language Video Generation 提出姿态引导的运动模型,用于生成精细且时序一致的手语视频 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
23 Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation 提出TUMSyn,一种文本引导的通用脑部MRI合成模型,用于定制多模态MRI生成。 contrastive learning multimodal
24 PASS: Path-selective State Space Model for Event-based Recognition 提出PASS框架,利用路径选择状态空间模型提升事件相机识别的频率泛化能力。 SSM state space model spatiotemporal
25 Adverse Weather Optical Flow: Cumulative Homogeneous-Heterogeneous Adaptation 提出累积同构-异构自适应框架,解决恶劣天气下光流估计的难题。 distillation optical flow

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
26 Spotlight Text Detector: Spotlight on Candidate Regions Like a Camera 提出聚光灯文本检测器STD,解决场景文本检测中不规则形状和密集文本重叠问题。 spatial relationship

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
27 A Versatile and Differentiable Hand-Object Interaction Representation 提出CHOIR:一种通用且可微的手-物交互表示方法,用于精确合成HOI。 HOI

⬅️ 返回 cs.CV 首页 · 🏠 返回主页