cs.CV(2024-06-05)

📊 共 25 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (11 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
1 Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment 提出多示例视觉提示生成器MIVPG,增强多模态大语言模型中的视觉表征 large language model multimodal
2 Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach PlugIR:利用大语言模型实现交互式文本到图像检索,无需微调。 large language model instruction following
3 Identification of Stone Deterioration Patterns with Large Multimodal Models 利用大型多模态模型识别石材劣化模式,助力文化遗产保护 multimodal
4 Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI 提出一种Radiomics引导的多模态自注意力网络,用于预测乳腺MRI病理完全缓解 multimodal
5 AD-H: Autonomous Driving with Hierarchical Agents 提出AD-H:一种基于分层Agent的自动驾驶系统,提升泛化性和可解释性。 large language model multimodal
6 DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut DiffCut:利用扩散模型特征和递归归一化割催化零样本语义分割 foundation model multimodal
7 Exploiting LMM-based knowledge for image classification tasks 利用LMM知识增强图像分类:融合图像与文本嵌入 multimodal
8 Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision 提出Adapter-X,一种高效通用视觉参数高效微调框架,超越全参数微调。 foundation model
9 Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning 提出AENet,通过语义增强视觉提示提升零样本学习的泛化能力。 zero-shot transfer
10 Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models 提出加权视觉-文本交叉对齐方法,提升视觉-语言模型零样本性能 large language model
11 PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM PosterLLaVa:利用多模态大语言模型构建统一的多模态布局生成器 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
12 Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion Event3DGS:基于事件相机的高速机器人三维高斯溅射 3D gaussian splatting 3DGS gaussian splatting
13 Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts 提出偏振波前激光雷达(PolLidar),用于远距离场景的三维重建,提升法向量和距离估计精度。 scene reconstruction PULSE
14 Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories 提出基于运动自由度的动态点场模型,用于从点轨迹推断场景动态 scene reconstruction spatiotemporal motion tracking
15 Gaussian Primitives for Deformable Image Registration 提出GaussianDIR,利用高斯基元进行可变形图像配准,提升精度与效率。 3D gaussian splatting gaussian splatting splatting
16 GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats GSGAN:提出基于对抗学习的分层高斯溅射3D生成方法,提升生成速度。 3D gaussian splatting gaussian splatting splatting
17 Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling 提出多条件引导框架,增强隐式解耦能力,实现复杂背景下多角色图像动画 optical flow character animation
18 CoFie: Learning Compact Neural Surface Representations with Coordinate Fields CoFie:利用坐标场学习紧凑的神经表面表示,显著降低形状误差。 implicit representation

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
19 Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis 提出Mamba模型以解决医疗图像分析中的计算效率问题 Mamba SSM state space model
20 Tiny models from tiny data: Textual and null-text inversion for few-shot distillation 提出TINT:结合文本和空文本反演的少样本蒸馏方法,提升小模型精度。 distillation foundation model
21 Dream-in-Style: Text-to-3D Generation Using Stylized Score Distillation Dream-in-Style:提出基于风格化Score Distillation的文本到3D生成方法 distillation neural radiance field
22 Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation 提出多任务多尺度对比知识蒸馏,提升医学图像分割效率 contrastive learning distillation
23 Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond 提出多粒度自监督学习框架,提升骨骼动作表示的泛化能力 representation learning contrastive learning
24 FILS: Self-Supervised Video Feature Prediction In Semantic Language Space 提出FILS,利用语义语言空间中的自监督视频特征预测方法,提升视频表征能力。 visual pre-training egocentric

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
25 EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos EgoSurgery-Tool:一个用于术中工具和手部检测的自中心视角手术视频数据集 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页