cs.CV（2024-09-05）

📊 共 18 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一：机器人控制 (Robot Control) (2) 支柱四：生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Foundation Model or Finetune? Evaluation of few-shot semantic segmentation for river pollution	针对河流污染语义分割，研究表明微调模型优于Foundation Model	foundation model
2	Tissue Concepts: supervised foundation models in computational pathology	提出Tissue Concepts：一种基于监督学习的病理计算领域预训练模型	foundation model
3	Few-shot Adaptation of Medical Vision-Language Models	针对医学视觉-语言模型，提出一种高效的少样本自适应基准与方法。	foundation model zero-shot transfer	✅
4	TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations	TC-LLaVA：通过时序建模增强LLM，提升图像到视频理解的迁移能力	large language model multimodal
5	MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition	提出多尺度视频Transformer网络MVTN，用于提升动态手势识别精度。	multimodal	✅
6	Have Large Vision-Language Models Mastered Art History?	评估大型视觉-语言模型在艺术史领域的掌握程度	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
7	LM-Gaussian: Boost Sparse-view 3D Gaussian Splatting with Large Model Priors	LM-Gaussian：利用大模型先验提升稀疏视角3D高斯溅射重建效果	3D gaussian splatting 3DGS gaussian splatting
8	FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation	FrozenSeg：融合冻结的预训练模型，实现开放词汇分割	open-vocabulary open vocabulary foundation model	✅
9	Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding	Lexicon3D：探究视觉基础模型在复杂3D场景理解中的能力与局限性	scene understanding foundation model visual grounding	✅
10	Weight Conditioning for Smooth Optimization of Neural Networks	提出权重调节方法，通过平滑神经网络优化过程提升模型性能	NeRF neural radiance field
11	Estimating Indoor Scene Depth Maps from Ultrasonic Echoes	提出一种利用可听声辅助训练的超声回声室内场景深度估计方法	depth estimation

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction	提出SVS-GS框架，优化3D高斯溅射在稀疏视角下的场景重建	distillation 3D gaussian splatting 3DGS
13	UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images	UV-Mamba：一种DCN增强的状态空间模型，用于高分辨率遥感影像中城中村边界识别	Mamba state space model	✅
14	Data-Efficient Generation for Dataset Distillation	提出基于条件潜在扩散模型的数据集蒸馏方法，提升合成图像质量与蒸馏效率。	distillation
15	Granular-ball Representation Learning for Deep CNN on Learning with Label Noise	提出基于粒球表示学习的深度CNN模型，提升含噪声标签数据下的模型鲁棒性。	representation learning

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
16	OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving	OccLLaMA：面向自动驾驶的Occupancy-Language-Action生成式世界模型	motion planning world model VQ-VAE
17	Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks	提出非均匀光照攻击（NUI）方法，评估并提升CNN在图像分类任务中的鲁棒性。	manipulation

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	HUMOS: Human Motion Model Conditioned on Body Shape	提出HUMOS，基于身体形状生成更真实的人体运动模型，解决现有方法忽略体型差异的问题。	physically plausible	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页