cs.CV(2024-09-05)
📊 共 18 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution | 针对河流污染语义分割,研究表明微调模型优于Foundation Model | foundation model | ||
| 2 | Tissue Concepts: supervised foundation models in computational pathology | 提出Tissue Concepts:一种基于监督学习的病理计算领域预训练模型 | foundation model | ||
| 3 | Few-shot Adaptation of Medical Vision-Language Models | 针对医学视觉-语言模型,提出一种高效的少样本自适应基准与方法。 | foundation model zero-shot transfer | ✅ | |
| 4 | TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations | TC-LLaVA:通过时序建模增强LLM,提升图像到视频理解的迁移能力 | large language model multimodal | ||
| 5 | MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition | 提出多尺度视频Transformer网络MVTN,用于提升动态手势识别精度。 | multimodal | ✅ | |
| 6 | Have Large Vision-Language Models Mastered Art History? | 评估大型视觉-语言模型在艺术史领域的掌握程度 | multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors | LM-Gaussian:利用大模型先验提升稀疏视角3D高斯溅射重建效果 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 8 | FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation | FrozenSeg:融合冻结的预训练模型,实现开放词汇分割 | open-vocabulary open vocabulary foundation model | ✅ | |
| 9 | Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding | Lexicon3D:探究视觉基础模型在复杂3D场景理解中的能力与局限性 | scene understanding foundation model visual grounding | ✅ | |
| 10 | Weight Conditioning for Smooth Optimization of Neural Networks | 提出权重调节方法,通过平滑神经网络优化过程提升模型性能 | NeRF neural radiance field | ||
| 11 | Estimating Indoor Scene Depth Maps from Ultrasonic Echoes | 提出一种利用可听声辅助训练的超声回声室内场景深度估计方法 | depth estimation |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction | 提出SVS-GS框架,优化3D高斯溅射在稀疏视角下的场景重建 | distillation 3D gaussian splatting 3DGS | ||
| 13 | UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images | UV-Mamba:一种DCN增强的状态空间模型,用于高分辨率遥感影像中城中村边界识别 | Mamba state space model | ✅ | |
| 14 | Data-Efficient Generation for Dataset Distillation | 提出基于条件潜在扩散模型的数据集蒸馏方法,提升合成图像质量与蒸馏效率。 | distillation | ||
| 15 | Granular-ball Representation Learning for Deep CNN on Learning with Label Noise | 提出基于粒球表示学习的深度CNN模型,提升含噪声标签数据下的模型鲁棒性。 | representation learning |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving | OccLLaMA:面向自动驾驶的Occupancy-Language-Action生成式世界模型 | motion planning world model VQ-VAE | ||
| 17 | Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks | 提出非均匀光照攻击(NUI)方法,评估并提升CNN在图像分类任务中的鲁棒性。 | manipulation |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | HUMOS: Human Motion Model Conditioned on Body Shape | 提出HUMOS,基于身体形状生成更真实的人体运动模型,解决现有方法忽略体型差异的问题。 | physically plausible | ✅ |