cs.CV(2025-01-27)
📊 共 23 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (8)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (6 🔗1)
支柱四:生成式动作 (Generative Motion) (2 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Deformable Beta Splatting | 提出可变形Beta Splatting以解决3D辐射场重建问题 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 10 | Controllable Hand Grasp Generation for HOI and Efficient Evaluation Methods | 提出基于高阶几何表示的可控手部抓取生成方法,并设计高效评估指标。 | affordance HOI | ||
| 11 | Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM | 利用LLM生成定制化Prompt,用于零样本罕见事件医学图像分类 | open-vocabulary open vocabulary large language model | ||
| 12 | Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction | 提出一种基于多尺度标记投影的自动多相机标定方法,用于3D手术场景重建。 | scene reconstruction | ||
| 13 | PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding | 提出PhysBench基准测试和PhysAgent框架,提升视觉语言模型对物理世界的理解 | scene understanding embodied AI | ||
| 14 | LinPrim: Linear Primitives for Differentiable Volumetric Rendering | 提出基于线性图元的体渲染方法,实现高效可微的 novel view synthesis。 | NeRF |
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Distilling foundation models for robust and efficient models in digital pathology | 提出H0-mini模型,通过知识蒸馏提升数字病理学中模型的鲁棒性和效率。 | distillation foundation model | ||
| 16 | A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks | 计算病理学中的Foundation Model综述:数据集、适配策略与评估任务 | contrastive learning foundation model | ||
| 17 | Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration | 提出TAMambaIR,一种高效的纹理感知状态空间模型,用于图像复原。 | Mamba state space model | ||
| 18 | ARFlow: Autoregressive Flow with Hybrid Linear Attention | ARFlow:结合自回归建模和混合线性注意力机制的Flow模型,提升图像生成质量。 | linear attention classifier-free guidance | ||
| 19 | The Linear Attention Resurrection in Vision Transformer | 提出L$^2$ViT,结合线性注意力与局部注意力,实现高效全局表征学习。 | linear attention | ||
| 20 | NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation | 提出NanoHTNet以解决边缘设备上3D人体姿态估计效率问题 | contrastive learning implicit representation | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | PackDiT: Joint Human Motion and Text Generation via Mutual Prompting | PackDiT:通过互提示实现联合人体运动和文本生成 | text-to-motion motion generation | ||
| 22 | BAG: Body-Aligned 3D Wearable Asset Generation | 提出BAG:一种身体对齐的3D可穿戴资产生成方法,实现自动穿戴。 | penetration | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches | SketchYourSeg:提出一种基于草图的无掩码主观图像分割框架。 | spatial relationship foundation model |