cs.CV（2024-06-05）

📊 共 25 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (11 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment	提出多示例视觉提示生成器MIVPG，增强多模态大语言模型中的视觉表征	large language model multimodal
2	Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach	PlugIR：利用大语言模型实现交互式文本到图像检索，无需微调。	large language model instruction following	✅
3	Identification of Stone Deterioration Patterns with Large Multimodal Models	利用大型多模态模型识别石材劣化模式，助力文化遗产保护	multimodal
4	Radiomics-guided Multimodal Self-attention Network for Predicting Pathological Complete Response in Breast MRI	提出一种Radiomics引导的多模态自注意力网络，用于预测乳腺MRI病理完全缓解	multimodal
5	AD-H: Autonomous Driving with Hierarchical Agents	提出AD-H：一种基于分层Agent的自动驾驶系统，提升泛化性和可解释性。	large language model multimodal	✅
6	DiffCut: Catalyzing Zero-Shot Semantic Segmentation with Diffusion Features and Recursive Normalized Cut	DiffCut：利用扩散模型特征和递归归一化割催化零样本语义分割	foundation model multimodal
7	Exploiting LMM-based knowledge for image classification tasks	利用LMM知识增强图像分类：融合图像与文本嵌入	multimodal
8	Adapter-X: A Novel General Parameter-Efficient Fine-Tuning Framework for Vision	提出Adapter-X，一种高效通用视觉参数高效微调框架，超越全参数微调。	foundation model
9	Attend and Enrich: Enhanced Visual Prompt for Zero-Shot Learning	提出AENet，通过语义增强视觉提示提升零样本学习的泛化能力。	zero-shot transfer
10	Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models	提出加权视觉-文本交叉对齐方法，提升视觉-语言模型零样本性能	large language model
11	PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM	PosterLLaVa：利用多模态大语言模型构建统一的多模态布局生成器	large language model	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Event3DGS: Event-Based 3D Gaussian Splatting for High-Speed Robot Egomotion	Event3DGS：基于事件相机的高速机器人三维高斯溅射	3D gaussian splatting 3DGS gaussian splatting
13	Polarization Wavefront Lidar: Learning Large Scene Reconstruction from Polarized Wavefronts	提出偏振波前激光雷达(PolLidar)，用于远距离场景的三维重建，提升法向量和距离估计精度。	scene reconstruction PULSE
14	Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories	提出基于运动自由度的动态点场模型，用于从点轨迹推断场景动态	scene reconstruction spatiotemporal motion tracking	✅
15	Gaussian Primitives for Deformable Image Registration	提出GaussianDIR，利用高斯基元进行可变形图像配准，提升精度与效率。	3D gaussian splatting gaussian splatting splatting
16	GSGAN: Adversarial Learning for Hierarchical Generation of 3D Gaussian Splats	GSGAN：提出基于对抗学习的分层高斯溅射3D生成方法，提升生成速度。	3D gaussian splatting gaussian splatting splatting	✅
17	Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling	提出多条件引导框架，增强隐式解耦能力，实现复杂背景下多角色图像动画	optical flow character animation
18	CoFie: Learning Compact Neural Surface Representations with Coordinate Fields	CoFie：利用坐标场学习紧凑的神经表面表示，显著降低形状误差。	implicit representation

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis	提出Mamba模型以解决医疗图像分析中的计算效率问题	Mamba SSM state space model
20	Tiny models from tiny data: Textual and null-text inversion for few-shot distillation	提出TINT：结合文本和空文本反演的少样本蒸馏方法，提升小模型精度。	distillation foundation model	✅
21	Dream-in-Style: Text-to-3D Generation Using Stylized Score Distillation	Dream-in-Style：提出基于风格化Score Distillation的文本到3D生成方法	distillation neural radiance field
22	Multi-Task Multi-Scale Contrastive Knowledge Distillation for Efficient Medical Image Segmentation	提出多任务多尺度对比知识蒸馏，提升医学图像分割效率	contrastive learning distillation
23	Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond	提出多粒度自监督学习框架，提升骨骼动作表示的泛化能力	representation learning contrastive learning
24	FILS: Self-Supervised Video Feature Prediction In Semantic Language Space	提出FILS，利用语义语言空间中的自监督视频特征预测方法，提升视频表征能力。	visual pre-training egocentric

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
25	EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos	EgoSurgery-Tool：一个用于术中工具和手部检测的自中心视角手术视频数据集	egocentric	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页