cs.CV(2025-05-06)
📊 共 22 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (4)
支柱七:动作重定向 (Motion Retargeting) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | 3D Gaussian Splatting Data Compression with Mixture of Priors | 提出基于混合先验的3D高斯溅射数据压缩方法,提升存储和传输效率。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 2 | LiftFeat: 3D Geometry-Aware Local Feature Matching | LiftFeat:提出一种3D几何感知的局部特征匹配方法,提升SLAM和视觉定位在恶劣环境下的鲁棒性。 | depth estimation monocular depth feature matching | ✅ | |
| 3 | Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation | 提出Show or Tell基准,用于评估语义分割中视觉和文本提示的性能。 | open-vocabulary open vocabulary large language model | ||
| 4 | OS-W2S: An Automatic Labeling Engine for Language-Guided Open-Set Aerial Object Detection | 提出OS-W2S自动标注引擎,构建大规模语言引导的开放集航拍目标检测数据集MI-OAD。 | open-vocabulary open vocabulary visual grounding | ✅ | |
| 5 | TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion | TimeTracker:基于事件相机的连续点跟踪视频插帧,解决非线性运动难题 | optical flow spatiotemporal | ||
| 6 | Read My Ears! Horse Ear Movement Detection for Equine Affective State Assessment | 提出基于深度学习和光流法的马耳运动检测方法,用于评估马的情感状态 | optical flow | ✅ | |
| 7 | 3D Surface Reconstruction with Enhanced High-Frequency Details | FreNeuS:利用高频信息增强神经隐式3D表面重建细节 | implicit representation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning | 提出UnifiedReward-Think,一种基于强化微调的统一多模态CoT奖励模型。 | multimodal chain-of-thought | ||
| 9 | PhysLLM: Harnessing Large Language Models for Cross-Modal Remote Physiological Sensing | PhysLLM:利用大语言模型进行跨模态远程生理信号感知 | large language model | ||
| 10 | UPMAD-Net: A Brain Tumor Segmentation Network with Uncertainty Guidance and Adaptive Multimodal Feature Fusion | UPMAD-Net:结合不确定性引导和自适应多模态特征融合的脑肿瘤分割网络 | multimodal | ✅ | |
| 11 | Towards Efficient Benchmarking of Foundation Models in Remote Sensing: A Capabilities Encoding Approach | 提出遥感领域基础模型能力编码方法,高效预测模型在下游任务表现。 | foundation model | ✅ | |
| 12 | Deep Learning for Sports Video Event Detection: Tasks, Datasets, Methods, and Challenges | 综述深度学习在体育视频事件检测中的应用,明确任务定义、方法和挑战。 | multimodal | ||
| 13 | Multi-Agent System for Comprehensive Soccer Understanding | 提出SoccerAgent多智能体系统,用于全面的足球理解任务 | multimodal | ||
| 14 | SD-VSum: A Method and Dataset for Script-Driven Video Summarization | 提出SD-VSum:一种脚本驱动的视频摘要方法与数据集,实现用户定制化视频摘要。 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Fill the Gap: Quantifying and Reducing the Modality Gap in Image-Text Representation Learning | 提出度量与优化方法,缩小图像-文本表示学习中的模态差异 | representation learning multimodal | ||
| 16 | MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models | MambaStyle:利用状态空间模型实现高效StyleGAN反演与图像编辑 | Mamba | ||
| 17 | Real-Time Person Image Synthesis Using a Flow Matching Model | 提出基于Flow Matching的RPFM模型,实现实时姿态引导的人物图像合成。 | flow matching | ||
| 18 | seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models | seq-JEPA:通过自回归预测学习不变-协变世界模型,解决表征权衡问题。 | world model |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Fixed-Length Dense Fingerprint Representation | 提出FLARE框架,通过固定长度稠密指纹表示实现跨模态和低质量指纹的高效匹配。 | spatial relationship | ✅ | |
| 20 | Blending 3D Geometry and Machine Learning for Multi-View Stereopsis | 提出GC MVSNet++,通过多视角多尺度几何一致性约束加速多视角立体匹配学习。 | geometric consistency |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | StableMotion: Training Motion Cleanup Models with Unpaired Corrupted Data | StableMotion:利用非配对的损坏数据训练运动清理模型,提升动作捕捉质量。 | motion generation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | GUAVA: Generalizable Upper Body 3D Gaussian Avatar | GUAVA:提出可泛化的上身3D高斯头像重建框架,实现快速动画和渲染 | SMPL-X |