cs.CV（2024-09-30）

📊 共 29 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (13 🔗5) 支柱三：空间感知与语义 (Perception & Semantics) (8 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (3 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (2) 支柱七：动作重定向 (Motion Retargeting) (2 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
1	VMAD: Visual-enhanced Multimodal Large Language Model for Zero-Shot Anomaly Detection	提出VMAD：视觉增强的多模态大语言模型用于零样本异常检测	large language model multimodal
2	MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning	MM1.5：通过数据驱动的多模态LLM微调提升图像理解与多图像推理能力	large language model multimodal
3	Multimodal LLM Enhanced Cross-lingual Cross-modal Retrieval	提出LECCR，利用多模态LLM增强跨语言跨模态检索中的视觉和非英语表示对齐。	large language model multimodal	✅
4	Multimodal Alignment of Histopathological Images Using Cell Segmentation and Point Set Matching for Integrative Cancer Analysis	提出基于细胞分割和点集匹配的多模态组织病理图像配准方法，用于癌症整合分析。	multimodal
5	AI Foundation Model for Heliophysics: Applications, Design, and Implementation	面向日球物理学设计AI基础模型，利用SDO数据集探索应用	foundation model
6	OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection	OpenKD：开放提示多样性，实现零样本和少样本关键点检测	large language model foundation model multimodal	✅
7	Towards Unified Multimodal Editing with Enhanced Knowledge Collaboration	UniKE：通过增强知识协作实现统一的多模态编辑	multimodal	✅
8	World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filtering	提出World to Code，通过自指导组合式描述和过滤生成高质量多模态数据，提升视觉语言模型性能。	multimodal visual grounding	✅
9	Visual Context Window Extension: A New Perspective for Long Video Understanding	提出视觉上下文窗口扩展方法，解决大模型在长视频理解中的难题	large language model multimodal
10	Exploring Social Media Image Categorization Using Large Models with Different Adaptation Methods: A Case Study on Cultural Nature's Contributions to People	提出FLIPS数据集，并探索大模型在社交媒体图像分类中的应用，聚焦文化自然贡献	large language model
11	MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans	提出MM-Conv多模态对话数据集，用于增强虚拟人协同姿势生成。	multimodal
12	Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos	VidAssist：利用LLM进行教学视频中面向目标的规划	large language model
13	ACE: All-round Creator and Editor Following Instructions via Diffusion Transformer	提出ACE：基于扩散Transformer的通用图像生成与编辑模型	large language model	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels	提出PixelCLIP，利用无语义标签图像实现开放词汇语义分割	open-vocabulary open vocabulary foundation model	✅
15	Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation	提出PeskaVLP框架，通过层级知识增强解决手术视频-语言预训练中的知识鸿沟和时空对齐问题	scene understanding large language model zero-shot transfer	✅
16	SuperPose: Improved 6D Pose Estimation with Robust Tracking and Mask-Free Initialization	SuperPose：通过鲁棒跟踪和无掩码初始化改进6D位姿估计	6D pose estimation feature matching
17	OPONeRF: One-Point-One NeRF for Robust Neural Rendering	提出OPONeRF，通过个性化参数和不确定性建模，提升NeRF在动态场景下的鲁棒渲染能力。	NeRF
18	CCDepth: A Lightweight Self-supervised Depth Estimation Network with Enhanced Interpretability	CCDepth：一种轻量级、可解释性增强的自监督深度估计网络	depth estimation
19	Active Neural Mapping at Scale	提出基于NeRF的主动神经地图构建系统，用于高效探索大规模室内环境。	NeRF
20	Class-Agnostic Visio-Temporal Scene Sketch Semantic Segmentation	提出类无关时序视觉网络CAVT，用于场景草图的语义分割，并构建了大规模数据集FrISS。	scene understanding
21	DressRecon: Freeform 4D Human Reconstruction from Monocular Video	DressRecon：单目视频中自由形态4D人体重建，适用于宽松服装和物体交互场景	optical flow	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Survival Prediction in Lung Cancer through Multi-Modal Representation Learning	提出一种多模态表征学习方法，用于肺癌生存预测，融合CT、PET和基因组数据。	predictive model representation learning
23	Domain Consistency Representation Learning for Lifelong Person Re-Identification	提出域一致性表征学习(DCR)模型，解决终身ReID中域内区分性和域间一致性的矛盾。	representation learning distillation	✅
24	Classroom-Inspired Multi-Mentor Distillation with Adaptive Learning Strategies	提出ClassroomKD，一种自适应多导师知识蒸馏框架，提升学生模型性能。	distillation

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
25	REST-HANDS: Rehabilitation with Egocentric Vision Using Smartglasses for Treatment of Hands after Surviving Stroke	REST-HANDS：利用智能眼镜和第一视角视觉进行中风后手部康复	egocentric egocentric vision
26	HEADS-UP: Head-Mounted Egocentric Dataset for Trajectory Prediction in Blind Assistance Systems	提出HEADS-UP数据集，用于盲人辅助系统中基于头戴相机的轨迹预测	egocentric

🔬 支柱七：动作重定向 (Motion Retargeting) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
27	TSdetector: Temporal-Spatial Self-correction Collaborative Learning for Colonoscopy Video Detection	提出TSdetector以解决结肠镜视频中多发性息肉检测问题	spatial relationship	✅
28	SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers	提出SATA：利用空间自相关性提升Vision Transformer的鲁棒性	spatial relationship

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
29	Masked Autoregressive Model for Weather Forecasting	提出MAM4WF模型，结合掩码建模与自回归预测，提升长期天气预测精度。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页