cs.CV(2025-05-06)

📊 共 22 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 3D Gaussian Splatting Data Compression with Mixture of Priors 提出基于混合先验的3D高斯溅射数据压缩方法,提升存储和传输效率。 3D gaussian splatting 3DGS gaussian splatting
2 LiftFeat: 3D Geometry-Aware Local Feature Matching LiftFeat:提出一种3D几何感知的局部特征匹配方法,提升SLAM和视觉定位在恶劣环境下的鲁棒性。 depth estimation monocular depth feature matching
3 Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation 提出Show or Tell基准,用于评估语义分割中视觉和文本提示的性能。 open-vocabulary open vocabulary large language model
4 OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection 提出OS-W2S自动标注引擎,构建大规模语言引导的开放集航拍目标检测数据集MI-OAD。 open-vocabulary open vocabulary visual grounding
5 TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion TimeTracker:基于事件相机的连续点跟踪视频插帧,解决非线性运动难题 optical flow spatiotemporal
6 Read My Ears! Horse Ear Movement Detection for Equine Affective State Assessment 提出基于深度学习和光流法的马耳运动检测方法,用于评估马的情感状态 optical flow
7 3D Surface Reconstruction with Enhanced High-Frequency Details FreNeuS:利用高频信息增强神经隐式3D表面重建细节 implicit representation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
8 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning 提出UnifiedReward-Think,一种基于强化微调的统一多模态CoT奖励模型。 multimodal chain-of-thought
9 PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing PhysLLM:利用大语言模型进行跨模态远程生理信号感知 large language model
10 UPMAD-Net: A Brain Tumor Segmentation Network with Uncertainty Guidance and Adaptive Multimodal Feature Fusion UPMAD-Net:结合不确定性引导和自适应多模态特征融合的脑肿瘤分割网络 multimodal
11 Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach 提出遥感领域基础模型能力编码方法,高效预测模型在下游任务表现。 foundation model
12 Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and Challenges 综述深度学习在体育视频事件检测中的应用,明确任务定义、方法和挑战。 multimodal
13 Multi-Agent System for Comprehensive Soccer Understanding 提出SoccerAgent多智能体系统,用于全面的足球理解任务 multimodal
14 SD-VSum: A Method and Dataset for Script-Driven Video Summarization 提出SD-VSum:一种脚本驱动的视频摘要方法与数据集,实现用户定制化视频摘要。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
15 Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning 提出度量与优化方法,缩小图像-文本表示学习中的模态差异 representation learning multimodal
16 MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models MambaStyle:利用状态空间模型实现高效StyleGAN反演与图像编辑 Mamba
17 Real-Time Person Image Synthesis Using a Flow Matching Model 提出基于Flow Matching的RPFM模型,实现实时姿态引导的人物图像合成。 flow matching
18 seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models seq-JEPA:通过自回归预测学习不变-协变世界模型,解决表征权衡问题。 world model

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
19 Fixed-Length Dense Fingerprint Representation 提出FLARE框架,通过固定长度稠密指纹表示实现跨模态和低质量指纹的高效匹配。 spatial relationship
20 Blending 3D Geometry and Machine Learning for Multi-View Stereopsis 提出GC MVSNet++,通过多视角多尺度几何一致性约束加速多视角立体匹配学习。 geometric consistency

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
21 StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data StableMotion:利用非配对的损坏数据训练运动清理模型,提升动作捕捉质量。 motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
22 GUAVA: Generalizable Upper Body 3D Gaussian Avatar GUAVA:提出可泛化的上身3D高斯头像重建框架,实现快速动画和渲染 SMPL-X

⬅️ 返回 cs.CV 首页 · 🏠 返回主页