cs.CV（2025-05-14）

📊 共 27 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (11 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱八：物理动画 (Physics-based Animation) (3 🔗1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	FaceShield: Explainable Face Anti-Spoofing with Multimodal Large Language Models	提出FaceShield以解决面部反欺骗问题并增强可解释性	large language model multimodal	✅
2	BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset	BLIP3-o：全开放统一多模态模型族，架构、训练与数据集的全面研究	multimodal
3	Denoising and Alignment: Rethinking Domain Generalization for Multimodal Face Anti-Spoofing	提出MMDA框架，通过多模态去噪与对齐提升跨域人脸反欺骗泛化能力	multimodal
4	BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis	BioVFM：构建并扩展生物医学图像分析的自监督视觉基础模型	foundation model
5	Zero-Shot Multi-modal Large Language Model v.s. Supervised Deep Learning: A Comparative Study on CT-Based Intracranial Hemorrhage Subtyping	对比研究：零样本多模态大语言模型在CT图像颅内出血分型中表现不如监督深度学习	large language model
6	Bias and Generalizability of Foundation Models across Datasets in Breast Mammography	研究乳腺钼靶影像中预训练模型的偏见与泛化性，提出公平性感知方法。	foundation model
7	Relative Drawing Identification Complexity is Invariant to Modality in Vision-Language Models	研究表明视觉-语言模型中绘图识别的复杂性在不同模态间具有不变性	large language model multimodal
8	AMSnet 2.0: A Large AMS Database with AI Segmentation for Net Detection	提出基于AI分割的电路网络检测方法，构建大规模AMS电路数据库AMSnet 2.0。	large language model multimodal
9	MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning	MetaUAS：基于单样本元学习的通用异常分割，无需视觉-语言模型。	foundation model	✅
10	Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation	提出基于少量样本异常驱动生成的异常检测与分割方法，提升工业质检性能。	foundation model	✅
11	Beyond General Prompts: Automated Prompt Refinement using Contrastive Class Alignment Scores for Disambiguating Objects in Vision-Language Models	提出基于对比类对齐分数的自动提示优化方法，提升视觉-语言模型的目标检测精度。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Dyadic Mamba: Long-term Dyadic Human Motion Synthesis	Dyadic Mamba：利用状态空间模型实现长时程双人互动动作合成	Mamba SSM motion synthesis
13	Efficient Malicious UAV Detection Using Autoencoder-TSMamba Integration	提出基于自编码器-TSMamba集成的恶意无人机高效检测方法，提升检测精度和降低计算复杂度。	Mamba spatial relationship
14	MAKE: Multi-Aspect Knowledge-Enhanced Vision-Language Pretraining for Zero-shot Dermatological Assessment	提出MAKE框架，通过多方面知识增强的视觉-语言预训练，解决皮肤科零样本评估问题。	contrastive learning large language model multimodal
15	Seeing Beyond the Scene: Enhancing Vision-Language Models with Interactional Reasoning	提出ISGR框架，通过交互推理增强视觉-语言模型对复杂场景的理解能力。	reinforcement learning scene understanding spatial relationship
16	MrTrack: Register Mamba for Needle Tracking with Rapid Reciprocating Motion during Ultrasound-Guided Aspiration Biopsy	MrTrack：提出基于Mamba的注册机制，用于超声引导下穿刺活检中快速往复运动的针头追踪。	Mamba	✅
17	Unsupervised Multiview Contrastive Language-Image Joint Learning with Pseudo-Labeled Prompts Via Vision-Language Model for 3D/4D Facial Expression Recognition	提出MultiviewVLM，用于无监督多视角对比语言-图像联合学习的3D/4D面部表情识别。	representation learning contrastive learning

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
18	ExploreGS: a vision-based low overhead framework for 3D scene reconstruction	ExploreGS：一种低开销的无人机视觉三维场景重建框架	3D gaussian splatting 3DGS gaussian splatting
19	FreeDriveRF: Monocular RGB Dynamic NeRF without Poses for Autonomous Driving via Point-Level Dynamic-Static Decoupling	FreeDriveRF：无需位姿的单目RGB动态NeRF，用于自动驾驶场景的点级动态-静态解耦	NeRF neural radiance field scene reconstruction
20	RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo	RobustSpring：提出图像退化鲁棒性光流、场景流和立体视觉评测基准	optical flow scene flow
21	Neural Video Compression using 2D Gaussian Splatting	提出基于2D高斯溅射的神经视频压缩方法，加速编码并降低冗余，适用于实时视频应用。	gaussian splatting splatting
22	Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians	提出基于分裂2D高斯模型的稀疏点云块渲染方法，实现跨类别泛化。	NeRF	✅
23	DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection	DRRNet：宏微特征融合与双重逆向精炼用于伪装目标检测	scene understanding	✅

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
24	Using Foundation Models as Pseudo-Label Generators for Pre-Clinical 4D Cardiac CT Segmentation	利用预训练模型生成伪标签，用于临床前4D心脏CT分割。	motion tracking foundation model
25	Contactless Cardiac Pulse Monitoring Using Event Cameras	提出基于事件相机的非接触式心率监测方法，利用卷积神经网络从面部事件流中提取心率信号。	PULSE
26	BrainNetMLP: An Efficient and Effective Baseline for Functional Brain Network Classification	提出BrainNetMLP，一种高效且有效的基于MLP的功能脑网络分类基线方法	spatiotemporal	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	Text-driven Motion Generation: Overview, Challenges and Directions	综述文本驱动的运动生成，分析挑战与未来方向，助力虚拟现实等应用。	text-to-motion text-driven motion motion synthesis

⬅️ 返回 cs.CV 首页 · 🏠 返回主页