cs.CV(2025-07-21)
📊 共 30 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (6 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (3)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction | MeshMamba:利用状态空间模型进行可动3D网格生成与重建 | Mamba SSM state space model | ||
| 18 | CLAMP: Contrastive Learning with Adaptive Multi-loss and Progressive Fusion for Multimodal Aspect-Based Sentiment Analysis | 提出CLAMP框架,通过对比学习和自适应多损失融合解决多模态情感分析中的跨模态对齐问题。 | contrastive learning multimodal | ||
| 19 | Few-Shot Object Detection via Spatial-Channel State Space Model | 提出空间-通道状态空间模型以解决少样本目标检测问题 | Mamba state space model spatial relationship | ||
| 20 | Visual-Language Model Knowledge Distillation Method for Image Quality Assessment | 提出基于视觉-语言模型知识蒸馏的图像质量评估方法,提升模型效率与局部特征识别能力。 | distillation multimodal | ||
| 21 | Local Dense Logit Relations for Enhanced Knowledge Distillation | 提出局部密集关系Logit蒸馏(LDRLD),通过细粒度logit关系提升知识蒸馏效果。 | distillation | ||
| 22 | Efficient Face Image Quality Assessment via Self-training and Knowledge Distillation | 提出基于自训练和知识蒸馏的高效人脸图像质量评估方法,适用于实际部署。 | distillation | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | Is Tracking really more challenging in First Person Egocentric Vision? | 提出针对第一人称视角目标跟踪的基准研究,区分视角与场景的挑战。 | egocentric egocentric vision first-person view | ||
| 24 | SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction | 提出SeC框架,利用概念构建解决复杂视频分割中语义理解难题 | feature matching | ||
| 25 | Procedure Learning via Regularized Gromov-Wasserstein Optimal Transport | 提出基于正则化Gromov-Wasserstein最优传输的自监督程序学习框架 | egocentric |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos | Being-H0:基于大规模人类视频的视觉-语言-动作预训练模型,提升灵巧操作能力。 | manipulation sim-to-real motion generation | ✅ | |
| 27 | Discovering and using Spelke segments | 提出SpelkeNet,通过预测物体运动关系发现Spelke对象,提升物理交互任务性能。 | manipulation world model affordance |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | HOLa: Zero-Shot HOI Detection with Low-Rank Decomposed VLM Feature Adaptation | HOLa:一种基于低秩分解VLM特征自适应的零样本HOI检测方法 | human-object interaction HOI | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent | EgoPrune:面向具身智能Egomotion视频推理的高效Token剪枝方法 | spatiotemporal embodied AI multimodal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 30 | Toward a Real-Time Framework for Accurate Monocular 3D Human Pose Estimation with Geometric Priors | 提出结合几何先验的单目3D人体姿态实时估计框架 | human motion |