cs.CV(2024-07-05)

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge 提出基于外部知识的视觉提示方法,提升多模态大语言模型对细粒度视觉信息的理解能力。 large language model multimodal
2 MobileFlow: A Multimodal LLM For Mobile GUI Agent MobileFlow:面向移动GUI代理的多模态大语言模型,提升中文GUI理解与交互能力 large language model multimodal
3 Elevating All Zero-Shot Sketch-Based Image Retrieval Through Multimodal Prompt Learning 提出SpLIP,通过多模态Prompt学习提升零样本草图图像检索性能 foundation model multimodal
4 MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? MJ-Bench:评估多模态奖励模型在文本生成图像任务中的判断能力 multimodal
5 VCoME: Verbal Video Composition with Multimodal Editing Effects VCoME:提出一种基于多模态编辑效果的口语视频自动合成框架,提升视频的清晰度和视觉吸引力。 multimodal
6 Robust Multimodal Learning via Representation Decoupling 提出DMRNet,通过解耦多模态表征实现鲁棒的多模态学习 multimodal
7 Second Place Solution of WSDM2023 Toloka Visual Question Answering Challenge 提出基于OFA的三阶段视觉问答方案,在WSDM2023 Toloka VQA挑战赛中获得第二名 multimodal visual grounding
8 AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation 提出AWT框架,通过增强、加权和运输提升视觉-语言模型的迁移能力 multimodal
9 Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model 提出基于双重分布感知的上下文提示学习框架Dude,提升大视觉语言模型在细粒度分类任务上的性能。 large language model
10 Towards Context-aware Support for Color Vision Deficiency: An Approach Integrating LLM and AR 提出结合LLM和AR的上下文感知色觉障碍辅助系统 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
11 Unsupervised 4D Cardiac Motion Tracking with Spatiotemporal Optical Flow Networks 提出基于时空光流网络的无监督4D心脏运动追踪方法,提升超声心动图分析精度。 optical flow spatiotemporal motion tracking
12 ZARRIO @ Ego4D Short Term Object Interaction Anticipation Challenge: Leveraging Affordances and Attention-based models for STA 提出STAformer,融合环境认知与注意力机制,提升Ego4D短时物体交互预测性能。 affordance egocentric Ego4D
13 GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction GSD:基于高斯溅射扩散模型的单视角3D重建 gaussian splatting splatting
14 CountGD: Multi-Modal Open-World Counting 提出CountGD,一种多模态开放世界计数模型,提升了通用性和准确性。 open-vocabulary open vocabulary foundation model
15 Hybrid Primal Sketch: Combining Analogy, Qualitative Representations, and Computer Vision for Scene Understanding 提出混合原始草图框架,结合计算机视觉与认知模型实现场景理解 scene understanding
16 Segment Any 4D Gaussians 提出SA4D框架,实现对4D高斯模型的任意物体分割 3D gaussian splatting gaussian splatting splatting
17 A Physical Model-Guided Framework for Underwater Image Enhancement and Depth Estimation 提出物理模型引导的框架,用于水下图像增强和深度估计 depth estimation
18 Gaussian Eigen Models for Human Heads 提出高斯特征模型(GEM),用于创建轻量级、高质量且易于控制的人头化身。 gaussian splatting splatting

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
19 Self-Supervised Representation Learning for Adversarial Attack Detection 提出自监督表征学习框架,用于提升对抗攻击检测的泛化能力 representation learning
20 Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction 提出ReMamba,结合多模态对齐,实现自由手持3D超声重建 Mamba SSM state space model
21 AMD: Automatic Multi-step Distillation of Large-scale Vision Models 提出AMD:自动多步蒸馏方法,用于大规模视觉模型压缩 distillation
22 MARS: Paying more attention to visual attributes for text-based person search MARS:通过更关注视觉属性来改进基于文本的行人检索 masked autoencoder MAE

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
23 Neural varifolds: an aggregate representation for quantifying the geometry of point clouds 提出神经Varifold表示,用于量化点云几何形状,提升形状匹配和少样本分类性能。 geometric consistency

⬅️ 返回 cs.CV 首页 · 🏠 返回主页