cs.CV(2024-09-29)
📊 共 12 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (3)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗2)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation | MedViLaM:面向医学数据理解与生成,具备泛化性和可解释性的多模态大语言模型 | large language model multimodal | ||
| 2 | T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition | 提出VHD11K大规模多模态数据集,用于提升视觉有害内容识别能力。 | multimodal | ✅ | |
| 3 | One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | VideoLISA:基于语言指令的视频推理分割,实现时序一致性目标追踪 | large language model foundation model multimodal | ✅ | |
| 4 | Pear: Pruning and Sharing Adapters in Visual Parameter-Efficient Fine-Tuning | 提出Pear框架,通过剪枝和共享适配器实现视觉预训练模型的高效微调 | foundation model | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Grounding 3D Scene Affordance From Egocentric Interactions | 提出Ego-SAG框架,从第一视角交互视频中定位3D场景中的可交互区域。 | affordance egocentric | ||
| 6 | RNG: Relightable Neural Gaussians | 提出RNG:一种基于3D高斯分布的可重光照神经渲染方法,适用于复杂形状物体。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 7 | Robust Incremental Structure-from-Motion with Hybrid Features | 提出一种基于混合特征的鲁棒增量式SfM系统,提升弱纹理场景和低约束条件下的重建效果。 | scene reconstruction |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning | 提出病理线索驱动的表征学习模型PCRL,用于提升脑部CT报告生成质量。 | representation learning large language model | ✅ | |
| 9 | fCOP: Focal Length Estimation from Category-level Object Priors | 提出fCOP,利用类别级物体先验进行单目焦距估计 | representation learning depth estimation monocular depth | ||
| 10 | Hybrid Mamba for Few-Shot Segmentation | 提出混合Mamba网络(HMNet)用于解决小样本分割中支持信息利用不足的问题。 | Mamba | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Text-driven Human Motion Generation with Motion Masked Diffusion Model | 提出运动掩码扩散模型(MMDM),增强文本驱动人体运动生成中时空关系学习能力 | motion generation multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Focus On What Matters: Separated Models For Visual-Based RL Generalization | 提出SMG,通过分离模型和一致性损失提升视觉RL泛化能力 | manipulation reinforcement learning representation learning |