cs.CV(2025-12-24)
📊 共 25 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (8 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱一:机器人控制 (Robot Control) (1)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition | 提出分解与组合的多模态骨骼动作表示学习框架,提升效率与性能 | representation learning multimodal | ||
| 11 | SegMo: Segment-aligned Text to 3D Human Motion Generation | 提出SegMo框架,通过对齐文本和运动片段实现更精细的文本驱动3D人体动作生成。 | contrastive learning motion generation | ||
| 12 | Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential | 提出SpikeSurgSeg,一种基于脉冲神经网络的视频Transformer,用于实时手术场景分割。 | representation learning scene understanding spatiotemporal | ||
| 13 | TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning | TICON:一种用于组织病理学表征学习的切片级上下文建模方法 | representation learning foundation model | ||
| 14 | Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations | NExT-Vid:提出基于下一帧预测的自回归视频建模框架,提升视频表征学习效果。 | flow matching representation learning visual pre-training | ||
| 15 | Self-supervised Multiplex Consensus Mamba for General Image Fusion | 提出SMC-Mamba框架,用于通用图像融合,提升多种融合任务性能。 | Mamba contrastive learning | ||
| 16 | PUFM++: Point Cloud Upsampling via Enhanced Flow Matching | 提出PUFM++以解决稀疏点云上采样问题 | flow matching | ✅ | |
| 17 | XGrid-Mapping: Explicit Implicit Hybrid Grid Submaps for Efficient Incremental Neural LiDAR Mapping | 提出XGrid-Mapping,利用显隐混合网格子图实现高效增量式神经激光雷达建图 | distillation implicit representation |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting | 提出Quantile Rendering,高效嵌入高维特征于3D高斯溅射,提升开放词汇分割性能。 | 3D gaussian splatting gaussian splatting splatting | ||
| 19 | ORCA: Object Recognition and Comprehension for Archiving Marine Species | ORCA:用于海洋物种存档的目标识别与理解多模态基准 | open-vocabulary open vocabulary visual grounding | ||
| 20 | Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera | 提出基于光流引导的事件相机6DoF物体姿态跟踪方法 | optical flow | ||
| 21 | Towards Arbitrary Motion Completing via Hierarchical Continuous Representation | 提出基于分层连续表示的NAME框架,实现任意帧率的运动补全 | implicit representation | ||
| 22 | UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer | 提出UniPR-3D,利用视觉几何Transformer实现通用视觉定位识别。 | VGGT | ✅ |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy Prediction | 提出双图时空注意力网络以提高肺结节恶性预测准确性 | mutual attention spatiotemporal multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | Human Motion Estimation with Everyday Wearables | EveryWear:利用日常可穿戴设备进行轻量级、免标定的全身人体运动估计 | sim-to-real teacher-student egocentric |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | ACD: Direct Conditional Control for Video Diffusion Models via Attention Supervision | ACD:通过注意力监督实现视频扩散模型中的直接条件控制 | classifier-free guidance |