cs.CV(2024-10-19)
📊 共 15 篇论文 | 🔗 8 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗2)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | A Multimodal Vision Foundation Model for Clinical Dermatology | PanDerm:用于临床皮肤科的多模态视觉基础模型 | foundation model multimodal | ✅ | |
| 2 | Automated Segmentation and Analysis of Cone Photoreceptors in Multimodal Adaptive Optics Imaging | 提出基于U-Net的视网膜锥细胞自动分割方法,助力眼科疾病诊断 | multimodal | ||
| 3 | Group Diffusion Transformers are Unsupervised Multitask Learners | 提出Group Diffusion Transformers (GDTs),用于无监督多任务视觉生成,解决现有方法依赖特定数据集的问题。 | large language model multimodal | ||
| 4 | BYOCL: Build Your Own Consistent Latent with Hierarchical Representative Latent Clustering | BYOCL:通过分层代表性潜在聚类构建一致的潜在空间,解决SAM在图像序列分割中的语义不一致问题 | foundation model | ✅ | |
| 5 | Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling | CARS:通过自适应视频上下文建模实现流视频中的连续活动识别 | embodied AI | ||
| 6 | Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation | 提出Reflexive Guidance,提升视觉-语言模型在图像自适应概念生成中的OoDD检测能力 | foundation model | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion | 提出Spatial-Mamba,通过结构感知状态融合有效建模视觉状态空间,提升图像理解能力。 | Mamba SSM state space model | ✅ | |
| 8 | LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound | LLaVA-Ultra:面向超声影像的中文多模态大语言模型 | distillation large language model multimodal | ||
| 9 | Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step | 提出SiDA:通过对抗蒸馏,单步超越教师模型的图像生成方法 | distillation classifier-free guidance | ✅ | |
| 10 | MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient Object Detection | 提出MambaSOD,利用双Mamba驱动的跨模态融合网络解决RGB-D显著性目标检测问题。 | Mamba | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain | DCDepth:离散余弦域的渐进式单目深度估计方法 | depth estimation monocular depth | ✅ | |
| 12 | GL-NeRF: Gauss-Laguerre Quadrature Enables Training-Free NeRF Acceleration | 提出GL-NeRF以解决NeRF体积渲染加速问题 | NeRF neural radiance field | ||
| 13 | Neural Radiance Field Image Refinement through End-to-End Sampling Point Optimization | 提出基于端到端采样点优化的NeRF图像优化方法,提升渲染质量。 | NeRF neural radiance field | ||
| 14 | Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding | 提出Part-Whole Relational Fusion框架,解决多模态场景理解中模态融合难题。 | scene understanding | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation | 提出SLIC:通过压缩域水印保护的图像编解码器,防御图像篡改 | manipulation |