cs.CV(2025-02-25)
📊 共 20 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗4)
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗1)
支柱八:物理动画 (Physics-based Animation) (3 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (3 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting | UniGS:提出基于高斯溅射的统一语言-图像-3D预训练方法 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 9 | OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation | OpenFly:用于空中视觉-语言导航的综合平台与大规模基准数据集 | 3D gaussian splatting gaussian splatting splatting | ||
| 10 | VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion | VLM-E2E:利用多模态驾驶员注意力融合增强端到端自动驾驶 | scene understanding multimodal | ||
| 11 | Realistic Clothed Human and Object Joint Reconstruction from a Single Image | 提出基于隐式表达和注意力机制的框架,用于单图重建逼真的人体服装和物体 | implicit representation |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Progressive Local Alignment for Medical Multimodal Pre-training | 提出PLAN,通过渐进式局部对齐网络提升医学多模态预训练效果 | contrastive learning multimodal | ||
| 13 | SYNTHIA: Novel Concept Design with Affordance Composition | SYNTHIA:基于功能可供性组合的创新概念设计框架 | curriculum learning affordance | ||
| 14 | OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference | OmniAlign-V:增强多模态大语言模型与人类偏好对齐的数据集与评测基准 | DPO direct preference optimization large language model | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | ASurvey: Spatiotemporal Consistency in Video Generation | 综述:视频生成中的时空一致性研究进展 | spatiotemporal foundation model | ||
| 16 | LightFC-X: Lightweight Convolutional Tracker for RGB-X Tracking | 提出LightFC-X,一种轻量级卷积RGB-X跟踪器,适用于资源受限设备上的多模态目标跟踪。 | spatiotemporal multimodal | ✅ | |
| 17 | A digital eye-fixation biomarker using a deep anomaly scheme to classify Parkisonian patterns | 提出基于深度异常检测的眼动注视生物标记,用于帕金森病模式分类 | spatiotemporal |
🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity | 提出EgoSim以解决身体佩戴摄像头的运动与活动识别问题 | egocentric motion tracking | ||
| 19 | Personalized Federated Learning for Egocentric Video Gaze Estimation with Comprehensive Parameter Frezzing | 提出FedCPF,通过全面参数冻结实现个性化联邦学习的注视估计。 | egocentric Ego4D | ||
| 20 | Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos | 提出基于梯度优化的任务图最大似然估计方法,用于理解自中心视频中的程序性活动。 | egocentric | ✅ |