cs.CV(2024-09-10)
📊 共 15 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | gsplat: An Open-Source Library for Gaussian Splatting | gsplat:用于高斯溅射的开源库,加速训练并降低内存占用。 | gaussian splatting splatting NeRF | ✅ | |
| 2 | Neuromorphic spatiotemporal optical flow: Enabling ultrafast visual perception beyond human capabilities | 提出神经形态时空光流方法,实现超越人类的超快视觉感知 | optical flow spatiotemporal | ||
| 3 | GigaGS: Scaling up Planar-Based 3D Gaussians for Large Scene Surface Reconstruction | GigaGS:扩展平面3D高斯到大规模场景表面重建 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 4 | LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation | LEIA:提出一种隐式3D铰接的潜在视角不变嵌入方法,无需运动信息。 | NeRF neural radiance field |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis | EyeCLIP:用于多模态眼科图像分析的视觉-语言基础模型 | contrastive learning foundation model | ||
| 6 | Loss Distillation via Gradient Matching for Point Cloud Completion with Weighted Chamfer Distance | 提出基于梯度匹配的损失蒸馏方法,用于点云补全,并使用加权倒角距离。 | distillation scene understanding | ✅ | |
| 7 | DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks | DetailCLIP:面向细节的CLIP模型,提升细粒度分割任务性能 | contrastive learning distillation | ✅ | |
| 8 | Learning Generative Interactive Environments By Trained Agent Exploration | 提出基于强化学习探索的生成交互环境模型GenieRedux,提升视觉保真度和可控性 | reinforcement learning world model | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | LIME: Less Is More for MLLM Evaluation | LIME:精简多模态大语言模型评估基准,提升效率与区分度 | large language model multimodal | ✅ | |
| 10 | Enhancing Long Video Understanding via Hierarchical Event-Based Memory | 提出基于分层事件记忆增强的LLM(HEM-LLM)用于提升长视频理解能力 | large language model foundation model | ||
| 11 | Shadow Removal Refinement via Material-Consistent Shadow Edges | 提出基于材质一致性阴影边缘的阴影去除优化方法 | foundation model | ||
| 12 | Aligning Machine and Human Visual Representations across Abstraction Levels | 提出一种对齐机器与人类视觉表征的方法,提升模型泛化性和鲁棒性。 | foundation model |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Human Motion Synthesis_ A Diffusion Approach for Motion Stitching and In-Betweening | 提出基于扩散模型的运动缝合与插值方法,生成逼真流畅的人体运动 | motion synthesis motion generation multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Test-Time Certifiable Self-Supervision to Bridge the Sim2Real Gap in Event-Based Satellite Pose Estimation | 提出基于测试时自监督的事件相机卫星姿态估计方法,弥合Sim2Real差距。 | sim2real |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | When to Extract ReID Features: A Selective Approach for Improved Multiple Object Tracking | 提出一种选择性ReID特征提取方法,在多目标跟踪中降低计算开销并提升精度。 | feature matching | ✅ |