cs.CV（2025-07-26）

📊 共 23 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (8 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱八：物理动画 (Physics-based Animation) (3) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Region-based Cluster Discrimination for Visual Representation Learning	提出RICE：基于区域聚类判别的视觉表征学习方法，提升密集预测任务性能	representation learning large language model multimodal	✅
2	HydraMamba: Multi-Head State Space Model for Global Point Cloud Learning	HydraMamba：面向全局点云学习的多头状态空间模型，提升长程依赖建模能力。	Mamba state space model	✅
3	MambaVesselNet++: A Hybrid CNN-Mamba Architecture for Medical Image Segmentation	MambaVesselNet++：一种混合CNN-Mamba架构，用于医学图像分割	Mamba SSM state space model	✅
4	Self-Guided Masked Autoencoder	提出自引导掩码自编码器，利用内部聚类信息提升表征学习效果。	representation learning masked autoencoder MAE
5	SpecBPP: A Self-Supervised Learning Approach for Hyperspectral Representation and Soil Organic Carbon Estimation	SpecBPP：一种用于高光谱表示和土壤有机碳估计的自监督学习方法	representation learning masked autoencoder MAE
6	JDATT: A Joint Distillation Framework for Atmospheric Turbulence Mitigation and Target Detection	提出JDATT：联合蒸馏框架，用于大气湍流抑制和目标检测	Mamba distillation
7	A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba	提出SAMA框架，利用Mamba进行结构感知和运动自适应的3D人体姿态估计	Mamba
8	A mini-batch training strategy for deep subspace clustering networks	提出基于Memory Bank的Mini-batch深度子空间聚类网络，解决高分辨率图像聚类问题。	representation learning contrastive learning

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
9	RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection	RaGS：利用4D雷达和单目线索，通过3D高斯溅射实现3D目标检测	3D gaussian splatting gaussian splatting splatting
10	Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention	提出反向对比注意力RCA，提升开放词汇指代目标检测性能	open-vocabulary open vocabulary multimodal	✅
11	UniCT Depth: Event-Image Fusion Based Monocular Depth Estimation with Convolution-Compensated ViT Dual SA Block	UniCT Depth：提出基于卷积补偿ViT双自注意力块的事件-图像融合单目深度估计方法	depth estimation monocular depth scene understanding
12	FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images	FROSS：基于RGB-D图像的快速在线3D语义场景图生成方法	scene understanding	✅
13	TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking	TrackAny3D：迁移预训练3D模型，实现类别统一的3D点云跟踪	MoGe
14	DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation	DepthFlow：利用深度-光流结构相关性进行无监督视频对象分割	optical flow
15	TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object Detection	TransFlow：利用视频扩散模型迁移运动知识，提升视频显著性目标检测性能	optical flow

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Predicting Brain Responses To Natural Movies With Multimodal LLMs	利用多模态LLM预测自然电影刺激下的大脑反应，在Algonauts 2025挑战赛中排名第四。	multimodal
17	LLMControl: Grounded Control of Text-to-Image Diffusion-based Synthesis with Multimodal LLMs	LLMControl：利用多模态LLM实现文本到图像扩散模型的可控生成	multimodal
18	OW-CLIP: Data-Efficient Visual Supervision for Open-World Object Detection via Human-AI Collaboration	提出OW-CLIP，通过人机协作和数据高效的视觉监督，解决开放世界目标检测问题。	large language model multimodal
19	ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking	ATCTrack：通过对齐目标-上下文线索与动态目标状态，实现鲁棒的视觉-语言跟踪	multimodal	✅

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
20	A Fast Parallel Median Filtering Algorithm Using Hierarchical Tiling	提出基于分层平铺的快速并行中值滤波算法，显著提升GPU上的滤波速度。	PULSE
21	HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly	HumanSAM：通过空间、外观和运动异常分类以人为中心的伪造视频	spatiotemporal
22	A Machine Learning Framework for Predicting Microphysical Properties of Ice Crystals from Cloud Particle Imagery	提出一种基于机器学习的框架，用于从云粒子图像预测冰晶的微物理性质	spatiotemporal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing	FineMotion：提出包含时空精细标注的人体动作生成与编辑数据集及基准	MDM motion generation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页