cs.CV（2025-05-05）

📊 共 24 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (13 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (4) 支柱三：空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (13 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Multimodal Deep Learning for Stroke Prediction and Detection using Retinal Imaging and Clinical Data	提出基于视网膜影像和临床数据的多模态深度学习方法，用于卒中预测和检测。	foundation model multimodal
2	AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation	提出AOR框架，利用解剖学知识增强医学大模型在胸部X光片解读中的推理能力。	multimodal
3	GAME: Learning Multimodal Interactions via Graph Structures for Personality Trait Estimation	提出GAME：通过图结构学习多模态交互，用于性格特质估计	multimodal
4	DeepSparse: A Foundation Model for Sparse-View CBCT Reconstruction	DeepSparse：用于稀疏视角CBCT重建的基石模型，提升重建质量并降低辐射剂量。	foundation model
5	Detect, Classify, Act: Categorizing Industrial Anomalies with Multi-Modal Large Language Models	提出VELM：利用多模态大语言模型进行工业异常分类，提升异常检测的实用性。	large language model
6	Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities	综述统一多模态理解与生成模型，分析架构范式、挑战与机遇，为未来研究提供指导。	multimodal	✅
7	Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction	Ming-Lite-Uni：统一视觉生成器和多模态自回归模型，实现自然多模态交互	multimodal
8	Timing Is Everything: Finding the Optimal Fusion Points in Multimodal Medical Imaging	提出基于序列前向搜索的多模态医学影像融合点优化方法，提升诊断精度。	multimodal
9	Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection	提出基于不确定性加权图像-事件多模态融合的视频异常检测方法	multimodal	✅
10	Using Knowledge Graphs to harvest datasets for efficient CLIP model training	利用知识图谱增强数据收集，高效训练CLIP模型	foundation model
11	RGBX-DiffusionDet: A Framework for Multi-Modal RGB-X Object Detection Using DiffusionDet	提出RGBX-DiffusionDet，利用扩散模型融合RGB图像与异构2D数据进行目标检测。	multimodal
12	Recent Advances in Out-of-Distribution Detection with CLIP-Like Models: A Survey	基于CLIP模型的OOD检测综述：提出图像-文本双模态视角下的新分类框架	multimodal
13	TeDA: Boosting Vision-Lanuage Models for Zero-Shot 3D Object Retrieval via Testing-time Distribution Alignment	提出TeDA，通过测试时分布对齐提升视觉-语言模型在零样本3D物体检索中的性能	multimodal	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
14	R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning	提出StableReinforce算法，提升多模态奖励模型长期推理能力与训练稳定性。	reinforcement learning reward design large language model
15	VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection	VAEmo：通过知识注入高效学习视觉-听觉情感表征，提升AVER性能。	representation learning contrastive learning large language model
16	Text to Image Generation and Editing: A Survey	全面综述文本到图像生成与编辑技术，洞察未来发展方向	Mamba classifier-free guidance foundation model
17	Learning 3D Persistent Embodied World Models	提出具有持久记忆的具身世界模型，用于一致性长时程规划。	policy learning world model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models	DiffuGTS：利用异常感知开放词汇注意力图和冻结扩散模型实现通用肿瘤分割	open-vocabulary open vocabulary	✅
19	VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery	提出VGLD框架，通过视觉引导的语言消歧实现单目深度尺度恢复	depth estimation monocular depth metric depth
20	6D Pose Estimation on Spoons and Hands	提出基于视频对象分割的6D姿态估计系统，用于追踪用餐时手和勺子的运动	6D pose estimation
21	DELTA: Dense Depth from Events and LiDAR using Transformer's Attention	DELTA：利用Transformer注意力机制融合事件相机与激光雷达数据，实现高精度稠密深度估计。	depth estimation

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
22	MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans	MetaScenes：提出一种自动化的真实世界3D扫描副本创建方法，用于具身智能研究。	manipulation sim-to-real embodied AI	✅
23	Sim2Real in endoscopy segmentation with a novel structure aware image translation	提出一种结构感知图像转换方法，用于内窥镜图像分割中的Sim2Real问题。	sim2real

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation	Scenethesis：基于语言和视觉Agent的3D场景生成框架	physically plausible penetration embodied AI

⬅️ 返回 cs.CV 首页 · 🏠 返回主页