cs.CV(2024-09-05)

📊 共 18 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution 针对河流污染语义分割,研究表明微调模型优于Foundation Model foundation model
2 Tissue Concepts: supervised foundation models in computational pathology 提出Tissue Concepts:一种基于监督学习的病理计算领域预训练模型 foundation model
3 Few-shot Adaptation of Medical Vision-Language Models 针对医学视觉-语言模型,提出一种高效的少样本自适应基准与方法。 foundation model zero-shot transfer
4 TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations TC-LLaVA:通过时序建模增强LLM,提升图像到视频理解的迁移能力 large language model multimodal
5 MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition 提出多尺度视频Transformer网络MVTN,用于提升动态手势识别精度。 multimodal
6 Have Large Vision-Language Models Mastered Art History? 评估大型视觉-语言模型在艺术史领域的掌握程度 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
7 LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors LM-Gaussian:利用大模型先验提升稀疏视角3D高斯溅射重建效果 3D gaussian splatting 3DGS gaussian splatting
8 FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation FrozenSeg:融合冻结的预训练模型,实现开放词汇分割 open-vocabulary open vocabulary foundation model
9 Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Lexicon3D:探究视觉基础模型在复杂3D场景理解中的能力与局限性 scene understanding foundation model visual grounding
10 Weight Conditioning for Smooth Optimization of Neural Networks 提出权重调节方法,通过平滑神经网络优化过程提升模型性能 NeRF neural radiance field
11 Estimating Indoor Scene Depth Maps from Ultrasonic Echoes 提出一种利用可听声辅助训练的超声回声室内场景深度估计方法 depth estimation

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
12 Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction 提出SVS-GS框架,优化3D高斯溅射在稀疏视角下的场景重建 distillation 3D gaussian splatting 3DGS
13 UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images UV-Mamba:一种DCN增强的状态空间模型,用于高分辨率遥感影像中城中村边界识别 Mamba state space model
14 Data-Efficient Generation for Dataset Distillation 提出基于条件潜在扩散模型的数据集蒸馏方法,提升合成图像质量与蒸馏效率。 distillation
15 Granular-ball Representation Learning for Deep CNN on Learning with Label Noise 提出基于粒球表示学习的深度CNN模型,提升含噪声标签数据下的模型鲁棒性。 representation learning

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
16 OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving OccLLaMA:面向自动驾驶的Occupancy-Language-Action生成式世界模型 motion planning world model VQ-VAE
17 Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks 提出非均匀光照攻击(NUI)方法,评估并提升CNN在图像分类任务中的鲁棒性。 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
18 HUMOS: Human Motion Model Conditioned on Body Shape 提出HUMOS,基于身体形状生成更真实的人体运动模型,解决现有方法忽略体型差异的问题。 physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页