cs.CV(2024-10-19)

📊 共 15 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 A Multimodal Vision Foundation Model for Clinical Dermatology PanDerm:用于临床皮肤科的多模态视觉基础模型 foundation model multimodal
2 Automated Segmentation and Analysis of Cone Photoreceptors in Multimodal Adaptive Optics Imaging 提出基于U-Net的视网膜锥细胞自动分割方法,助力眼科疾病诊断 multimodal
3 Group Diffusion Transformers are Unsupervised Multitask Learners 提出Group Diffusion Transformers (GDTs),用于无监督多任务视觉生成,解决现有方法依赖特定数据集的问题。 large language model multimodal
4 BYOCL: Build Your Own Consistent Latent with Hierarchical Representative Latent Clustering BYOCL:通过分层代表性潜在聚类构建一致的潜在空间,解决SAM在图像序列分割中的语义不一致问题 foundation model
5 Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling CARS:通过自适应视频上下文建模实现流视频中的连续活动识别 embodied AI
6 Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation 提出Reflexive Guidance,提升视觉-语言模型在图像自适应概念生成中的OoDD检测能力 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
7 Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion 提出Spatial-Mamba,通过结构感知状态融合有效建模视觉状态空间,提升图像理解能力。 Mamba SSM state space model
8 LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound LLaVA-Ultra:面向超声影像的中文多模态大语言模型 distillation large language model multimodal
9 Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step 提出SiDA:通过对抗蒸馏,单步超越教师模型的图像生成方法 distillation classifier-free guidance
10 MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient Object Detection 提出MambaSOD,利用双Mamba驱动的跨模态融合网络解决RGB-D显著性目标检测问题。 Mamba

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
11 DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain DCDepth:离散余弦域的渐进式单目深度估计方法 depth estimation monocular depth
12 GL-NeRF: Gauss-Laguerre Quadrature Enables Training-Free NeRF Acceleration 提出GL-NeRF以解决NeRF体积渲染加速问题 NeRF neural radiance field
13 Neural Radiance Field Image Refinement through End-to-End Sampling Point Optimization 提出基于端到端采样点优化的NeRF图像优化方法,提升渲染质量。 NeRF neural radiance field
14 Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding 提出Part-Whole Relational Fusion框架,解决多模态场景理解中模态融合难题。 scene understanding

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
15 SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation 提出SLIC:通过压缩域水印保护的图像编解码器,防御图像篡改 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页