cs.CV(2025-07-26)

📊 共 23 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱八:物理动画 (Physics-based Animation) (3) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 Region-based Cluster Discrimination for Visual Representation Learning 提出RICE:基于区域聚类判别的视觉表征学习方法,提升密集预测任务性能 representation learning large language model multimodal
2 HydraMamba: Multi-Head State Space Model for Global Point Cloud Learning HydraMamba:面向全局点云学习的多头状态空间模型,提升长程依赖建模能力。 Mamba state space model
3 MambaVesselNet++: A Hybrid CNN-Mamba Architecture for Medical Image Segmentation MambaVesselNet++:一种混合CNN-Mamba架构,用于医学图像分割 Mamba SSM state space model
4 Self-Guided Masked Autoencoder 提出自引导掩码自编码器,利用内部聚类信息提升表征学习效果。 representation learning masked autoencoder MAE
5 SpecBPP: A Self-Supervised Learning Approach for Hyperspectral Representation and Soil Organic Carbon Estimation SpecBPP:一种用于高光谱表示和土壤有机碳估计的自监督学习方法 representation learning masked autoencoder MAE
6 JDATT: A Joint Distillation Framework for Atmospheric Turbulence Mitigation and Target Detection 提出JDATT:联合蒸馏框架,用于大气湍流抑制和目标检测 Mamba distillation
7 A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba 提出SAMA框架,利用Mamba进行结构感知和运动自适应的3D人体姿态估计 Mamba
8 A mini-batch training strategy for deep subspace clustering networks 提出基于Memory Bank的Mini-batch深度子空间聚类网络,解决高分辨率图像聚类问题。 representation learning contrastive learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
9 RaGS: Unleashing 3D Gaussian Splatting from 4D Radar and Monocular Cues for 3D Object Detection RaGS:利用4D雷达和单目线索,通过3D高斯溅射实现3D目标检测 3D gaussian splatting gaussian splatting splatting
10 Interpretable Open-Vocabulary Referring Object Detection with Reverse Contrast Attention 提出反向对比注意力RCA,提升开放词汇指代目标检测性能 open-vocabulary open vocabulary multimodal
11 UniCT Depth: Event-Image Fusion Based Monocular Depth Estimation with Convolution-Compensated ViT Dual SA Block UniCT Depth:提出基于卷积补偿ViT双自注意力块的事件-图像融合单目深度估计方法 depth estimation monocular depth scene understanding
12 FROSS: Faster-than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images FROSS:基于RGB-D图像的快速在线3D语义场景图生成方法 scene understanding
13 TrackAny3D: Transferring Pretrained 3D Models for Category-unified 3D Point Cloud Tracking TrackAny3D:迁移预训练3D模型,实现类别统一的3D点云跟踪 MoGe
14 DepthFlow: Exploiting Depth-Flow Structural Correlations for Unsupervised Video Object Segmentation DepthFlow:利用深度-光流结构相关性进行无监督视频对象分割 optical flow
15 TransFlow: Motion Knowledge Transfer from Video Diffusion Models to Video Salient Object Detection TransFlow:利用视频扩散模型迁移运动知识,提升视频显著性目标检测性能 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
16 Predicting Brain Responses To Natural Movies With Multimodal LLMs 利用多模态LLM预测自然电影刺激下的大脑反应,在Algonauts 2025挑战赛中排名第四。 multimodal
17 LLMControl: Grounded Control of Text-to-Image Diffusion-based Synthesis with Multimodal LLMs LLMControl:利用多模态LLM实现文本到图像扩散模型的可控生成 multimodal
18 OW-CLIP: Data-Efficient Visual Supervision for Open-World Object Detection via Human-AI Collaboration 提出OW-CLIP,通过人机协作和数据高效的视觉监督,解决开放世界目标检测问题。 large language model multimodal
19 ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking ATCTrack:通过对齐目标-上下文线索与动态目标状态,实现鲁棒的视觉-语言跟踪 multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)

#题目一句话要点标签🔗
20 A Fast Parallel Median Filtering Algorithm Using Hierarchical Tiling 提出基于分层平铺的快速并行中值滤波算法,显著提升GPU上的滤波速度。 PULSE
21 HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly HumanSAM:通过空间、外观和运动异常分类以人为中心的伪造视频 spatiotemporal
22 A Machine Learning Framework for Predicting Microphysical Properties of Ice Crystals from Cloud Particle Imagery 提出一种基于机器学习的框架,用于从云粒子图像预测冰晶的微物理性质 spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
23 FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing FineMotion:提出包含时空精细标注的人体动作生成与编辑数据集及基准 MDM motion generation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页