cs.CV（2025-02-26）

📊 共 15 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ImageChain: Advancing Sequential Image-to-Text Reasoning in Multimodal Large Language Models	ImageChain：通过多轮对话增强多模态大语言模型中的序列图像到文本推理能力	large language model multimodal
2	Tell me why: Visual foundation models as self-explainable classifiers	提出ProtoFM：结合视觉基础模型与原型架构的自解释分类器	foundation model	✅
3	A Survey on Foundation-Model-Based Industrial Defect Detection	综述：基于预训练模型（Foundation Model）的工业缺陷检测方法	foundation model
4	CLIP-Optimized Multimodal Image Enhancement via ISP-CNN Fusion for Coal Mine IoVT under Uneven Illumination	提出基于ISP-CNN融合和CLIP优化的多模态图像增强方法，用于煤矿IoVT低照度场景。	multimodal
5	Improved YOLOv12 with LLM-Generated Synthetic Data for Enhanced Apple Detection and Benchmarking Against YOLOv11 and YOLOv10	利用LLM生成的合成数据改进YOLOv12，提升苹果检测性能并超越YOLOv11和YOLOv10	large language model
6	FungalZSL: Zero-Shot Fungal Classification with Image Captioning Using a Synthetic Data Approach	FungalZSL：利用合成数据和图像描述，实现真菌零样本分类	large language model
7	Sherlock: Towards Multi-scene Video Abnormal Event Extraction and Localization via a Global-local Spatial-sensitive LLM	提出Sherlock模型，用于多场景视频异常事件的抽取与定位。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
8	EndoMamba: An Efficient Foundation Model for Endoscopic Videos via Hierarchical Pre-training	EndoMamba：通过分层预训练实现内窥镜视频高效基础模型	Mamba state space model representation learning	✅
9	Pathology Report Generation and Multimodal Representation Learning for Cutaneous Melanocytic Lesions	针对皮肤黑色素细胞病变，提出病理报告自动生成与多模态表征学习方法。	representation learning multimodal
10	On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation	针对病理报告生成，文本预处理能有效避免多模态表征学习中的幻觉问题。	representation learning multimodal
11	Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator	提出跨上下文蒸馏与辅助引导蒸馏，提升单目深度估计性能	distillation depth estimation monocular depth

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
12	ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting	ArtGS：利用高斯溅射构建复杂铰接物体的可交互模型	gaussian splatting splatting
13	The NeRF Signature: Codebook-Aided Watermarking for Neural Radiance Fields	提出NeRF Signature，一种基于码本辅助的水印嵌入方法，用于保护神经辐射场版权。	NeRF neural radiance field	✅
14	Correspondence-Free Pose Estimation with Patterns: A Unified Approach for Multi-Dimensional Vision	提出一种基于模式的无对应点位姿估计统一方法，适用于多维视觉	6D pose estimation

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding	提出ProxyTransformation，利用代理注意力预处理点云流形，提升3D视觉定位性能。	spatial relationship multimodal visual grounding

⬅️ 返回 cs.CV 首页 · 🏠 返回主页