cs.CV(2024-09-27)
📊 共 26 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱三:空间感知与语义 (Perception & Semantics) (3 🔗2)
支柱一:机器人控制 (Robot Control) (3)
支柱八:物理动画 (Physics-based Animation) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Exploring Token Pruning in Vision State Space Models | 针对视觉状态空间模型,提出一种新型token剪枝方法以提升效率。 | Mamba SSM state space model | ||
| 12 | MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation | MiniVLN:通过渐进式知识蒸馏实现高效的视觉-语言导航 | distillation embodied AI VLN | ||
| 13 | How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks? | 研究大型掩码自编码器预训练在地球观测下游任务中的有效性 | masked autoencoder MAE foundation model | ||
| 14 | Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation | 提出面向学生的知识提炼方法,提升知识蒸馏效果 | distillation | ||
| 15 | Harmonizing knowledge Transfer in Neural Network with Unified Distillation | 提出统一蒸馏框架,融合神经网络多层知识迁移,提升模型性能。 | distillation | ||
| 16 | You Only Speak Once to See | 提出YOSS模型,利用音频引导实现图像中的物体定位 | contrastive learning scene understanding |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes | 提出时空2D高斯溅射,用于复杂动态场景下精确表面重建 | gaussian splatting splatting human-object interaction | ✅ | |
| 18 | Search3D: Hierarchical Open-Vocabulary 3D Segmentation | 提出Search3D,实现层级开放词汇3D分割与搜索 | open-vocabulary open vocabulary | ||
| 19 | Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation | 提出基于高斯溅射的文化遗产三维数字化与自动分割方法 | gaussian splatting splatting | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement | SinoSynth:基于物理的域随机化方法,用于可泛化的CBCT图像增强 | domain randomization | ||
| 21 | S2O: Static to Openable Enhancement for Articulated 3D Objects | 提出S2O框架,从静态3D物体生成可交互的可开合3D物体,用于机器人操作和具身智能。 | manipulation embodied AI | ||
| 22 | Spectral Wavelet Dropout: Regularization in the Wavelet Domain | 提出Spectral Wavelet Dropout (SWD),通过小波域正则化提升CNN泛化能力。 | manipulation |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | 综述性论文:多模态大语言模型在长视频理解中的应用与挑战 | spatiotemporal large language model multimodal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | UniCal: Unified Neural Sensor Calibration | UniCal:提出一种统一的神经传感器标定框架,解决自动驾驶车辆多传感器标定难题。 | geometric consistency |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation | PhysGen:基于刚体物理的图像到视频生成方法,实现逼真可控的视频生成。 | physically plausible | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images | 提出MHCDIFF,用于从被遮挡图像中重建具有细节的3D人体 | SMPL | ✅ |