cs.CV（2024-11-11）

📊 共 19 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱一：机器人控制 (Robot Control) (1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models	CapeLLM：基于多模态大语言模型的无支撑类别无关姿态估计	large language model multimodal	✅
2	StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification	StoryTeller：通过全局音视频角色识别改进长视频描述	large language model multimodal
3	OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision	OmniEdit：通过专家监督构建通用图像编辑模型，实现任意宽高比的七种编辑任务。	multimodal	✅
4	ConvMixFormer- A Resource-efficient Convolution Mixer for Transformer-based Dynamic Hand Gesture Recognition	提出ConvMixFormer，一种资源高效的卷积混合Transformer，用于动态手势识别。	multimodal	✅
5	MapSAM: Adapting Segment Anything Model for Automated Feature Detection in Historical Maps	MapSAM：通过高效微调SAM实现历史地图要素自动检测	foundation model
6	UMFC: Unsupervised Multi-Domain Feature Calibration for Vision-Language Models	提出UMFC：一种无监督多域特征校准方法，提升视觉-语言模型在跨域场景下的泛化能力。	zero-shot transfer	✅
7	Track Any Peppers: Weakly Supervised Sweet Pepper Tracking Using VLMs	Track Any Peppers：利用VLM弱监督实现甜椒精准追踪	foundation model

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models	利用fMRI基础模型进行全脑分析，解码视觉体验并映射语义信息	contrastive learning foundation model
9	SAMPart3D: Segment Any Part in 3D Objects	SAMPart3D：无需文本提示，分割任意3D物体部件	distillation foundation model
10	SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking	SynCL：结合实例感知对比学习的协同训练策略，用于端到端多相机3D跟踪	contrastive learning
11	Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning	提出多阶段知识集成网络MulKI，解决视觉-语言模型持续学习中的灾难性遗忘问题。	distillation multimodal
12	LFSamba: Marry SAM with Mamba for Light Field Salient Object Detection	LFSamba：结合SAM与Mamba的光场显著性目标检测模型	Mamba	✅
13	XPoint: A Self-Supervised Visual-State-Space based Architecture for Multispectral Image Registration	XPoint：一种基于自监督视觉状态空间的多光谱图像配准架构	Mamba feature matching	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
14	A Hierarchical Compression Technique for 3D Gaussian Splatting Compression	提出一种层级压缩技术HGSC，用于高效压缩3D高斯溅射数据，提升存储与传输效率。	3D gaussian splatting gaussian splatting splatting
15	$SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation	提出基于$SE(3)$等变射线嵌入的隐式多视角深度估计方法	depth estimation stereo depth scene understanding
16	LuSh-NeRF: Lighting up and Sharpening NeRFs for Low-light Scenes	LuSh-NeRF：通过光照增强和锐化NeRF，解决低光照场景下的NeRF重建问题	NeRF neural radiance field	✅
17	Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models	Add-it：基于预训练扩散模型的免训练图像对象插入方法	affordance

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	DRIFTS: Optimizing Domain Randomization with Synthetic Data and Weight Interpolation for Fetal Brain Tissue Segmentation	DRIFTS：结合合成数据与权重插值优化领域随机化，用于胎儿脑组织分割	domain randomization

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	HomoMatcher: Dense Feature Matching Results with Semi-Dense Efficiency by Homography Estimation	HomoMatcher：通过单应性估计实现半稠密效率的稠密特征匹配	feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页