cs.CV(2024-11-11)

📊 共 19 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models CapeLLM:基于多模态大语言模型的无支撑类别无关姿态估计 large language model multimodal
2 StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification StoryTeller:通过全局音视频角色识别改进长视频描述 large language model multimodal
3 OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision OmniEdit:通过专家监督构建通用图像编辑模型,实现任意宽高比的七种编辑任务。 multimodal
4 ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition 提出ConvMixFormer,一种资源高效的卷积混合Transformer,用于动态手势识别。 multimodal
5 MapSAM: Adapting Segment Anything Model for Automated Feature Detection in Historical Maps MapSAM:通过高效微调SAM实现历史地图要素自动检测 foundation model
6 UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models 提出UMFC:一种无监督多域特征校准方法,提升视觉-语言模型在跨域场景下的泛化能力。 zero-shot transfer
7 Track Any Peppers: Weakly Supervised Sweet Pepper Tracking Using VLMs Track Any Peppers:利用VLM弱监督实现甜椒精准追踪 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
8 Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models 利用fMRI基础模型进行全脑分析,解码视觉体验并映射语义信息 contrastive learning foundation model
9 SAMPart3D: Segment Any Part in 3D Objects SAMPart3D:无需文本提示,分割任意3D物体部件 distillation foundation model
10 SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking SynCL:结合实例感知对比学习的协同训练策略,用于端到端多相机3D跟踪 contrastive learning
11 Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning 提出多阶段知识集成网络MulKI,解决视觉-语言模型持续学习中的灾难性遗忘问题。 distillation multimodal
12 LFSamba: Marry SAM with Mamba for Light Field Salient Object Detection LFSamba:结合SAM与Mamba的光场显著性目标检测模型 Mamba
13 XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration XPoint:一种基于自监督视觉状态空间的多光谱图像配准架构 Mamba feature matching

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
14 A Hierarchical Compression Technique for 3D Gaussian Splatting Compression 提出一种层级压缩技术HGSC,用于高效压缩3D高斯溅射数据,提升存储与传输效率。 3D gaussian splatting gaussian splatting splatting
15 $SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation 提出基于$SE(3)$等变射线嵌入的隐式多视角深度估计方法 depth estimation stereo depth scene understanding
16 LuSh-NeRF: Lighting up and Sharpening NeRFs for Low-light Scenes LuSh-NeRF:通过光照增强和锐化NeRF,解决低光照场景下的NeRF重建问题 NeRF neural radiance field
17 Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Add-it:基于预训练扩散模型的免训练图像对象插入方法 affordance

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
18 DRIFTS: Optimizing Domain Randomization with Synthetic Data and Weight Interpolation for Fetal Brain Tissue Segmentation DRIFTS:结合合成数据与权重插值优化领域随机化,用于胎儿脑组织分割 domain randomization

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
19 HomoMatcher: Dense Feature Matching Results with Semi-Dense Efficiency by Homography Estimation HomoMatcher:通过单应性估计实现半稠密效率的稠密特征匹配 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页