cs.CV(2024-10-26)
📊 共 5 篇论文
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (3)
支柱二:RL算法与架构 (RL & Architecture) (1)
支柱三:空间感知与语义 (Perception & Semantics) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | You Never Know: Quantization Induces Inconsistent Biases in Vision-Language Foundation Models | 量化导致视觉-语言模型社会公平性偏差不一致:一项对比研究 | foundation model | ||
| 2 | GiVE: Guiding Visual Encoder to Perceive Overlooked Information | 提出GiVE以解决视觉编码器忽视信息的问题 | large language model multimodal | ||
| 3 | Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning | 提出自适应视频理解Agent,通过动态帧采样和反馈驱动推理提升效率。 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models | 提出DIFFUSIONHOI,利用关系驱动的扩散模型提升人-物交互检测性能。 | contrastive learning human-object interaction HOI |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | SCube: Instant Large-Scale Scene Reconstruction using VoxSplats | SCube:利用VoxSplats实现大规模场景的快速重建 | scene reconstruction |