cs.CV（2025-01-28）

📊 共 16 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (5) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (4) 支柱四：生成式动作 (Generative Motion) (1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Beyond-Labels: Advancing Open-Vocabulary Segmentation With Vision-Language Models	提出Beyond-Labels，利用视觉-语言模型提升开放词汇语义分割性能	open-vocabulary open vocabulary foundation model
2	Modulating CNN Features with Pre-Trained ViT Representations for Open-Vocabulary Object Detection	提出VMCNet以解决开放词汇物体检测中的表示不足问题	open-vocabulary open vocabulary
3	Contextual Self-paced Learning for Weakly Supervised Spatio-Temporal Video Grounding	提出CoSPaL，解决弱监督时空视频定位中复杂查询理解和时序一致性问题。	scene understanding foundation model multimodal
4	Evaluating CrowdSplat: Perceived Level of Detail for Gaussian Crowds	评估CrowdSplat：高斯人群的感知细节层次，优化实时渲染。	3D gaussian splatting gaussian splatting splatting
5	Image Velocimetry using Direct Displacement Field estimation with Neural Networks for Fluids	提出一种基于神经网络直接估计位移场的图像测速方法，提升流体速度场空间分辨率。	optical flow

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
6	VidSole: A Multimodal Dataset for Joint Kinetics Quantification and Disease Detection with Deep Learning	VidSole：构建多模态数据集，结合深度学习进行步态疾病检测与动力学量化。	MAE multimodal
7	FedEFM: Federated Endovascular Foundation Model with Unseen Data	FedEFM：用于血管内手术的联邦学习基础模型，解决未见数据问题	distillation foundation model
8	A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts	提出对比师生框架，解决风格迁移下的新颖性检测问题	teacher-student distillation
9	Adversarial Masked Autoencoder Purifier with Defense Transferability	提出基于掩码自编码器的对抗样本净化器MAEP，提升防御迁移能力。	masked autoencoder MAE
10	CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors	提出类别语义先验对比学习(CSPCL)，提升Deformable DETR在X光违禁品检测中的性能	contrastive learning	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding	提出Stackable Temporal Encoder以解决视频理解中的时间建模问题	large language model multimodal
12	Molecular-driven Foundation Model for Oncologic Pathology	Threads：基于分子驱动的肿瘤病理学Foundation Model，实现全切片图像通用表征。	foundation model multimodal
13	Ultra-high resolution multimodal MRI densely labelled holistic structural brain atlas	构建基于多模态超高分辨率MRI的人脑整体结构图谱，用于提升神经系统疾病早期检测。	multimodal
14	One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning	提出Block-LoRA，一种基于分块矩阵的低秩适配方法，用于提升CLIP模型在少样本学习中的效率。	foundation model

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	FlexMotion: Lightweight, Physics-Aware, and Controllable Human Motion Generation	提出FlexMotion以解决人类运动生成的效率与可控性问题	motion synthesis motion generation physically plausible

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	B-RIGHT: Benchmark Re-evaluation for Integrity in Generalized Human-Object Interaction Testing	B-RIGHT：用于广义人-物交互测试的完整性基准再评估	human-object interaction HOI

⬅️ 返回 cs.CV 首页 · 🏠 返回主页