cs.CV（2024-10-19）

📊 共 15 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (6 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (4 🔗2) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	A Multimodal Vision Foundation Model for Clinical Dermatology	PanDerm：用于临床皮肤科的多模态视觉基础模型	foundation model multimodal	✅
2	Automated Segmentation and Analysis of Cone Photoreceptors in Multimodal Adaptive Optics Imaging	提出基于U-Net的视网膜锥细胞自动分割方法，助力眼科疾病诊断	multimodal
3	Group Diffusion Transformers are Unsupervised Multitask Learners	提出Group Diffusion Transformers (GDTs)，用于无监督多任务视觉生成，解决现有方法依赖特定数据集的问题。	large language model multimodal
4	BYOCL: Build Your Own Consistent Latent with Hierarchical Representative Latent Clustering	BYOCL：通过分层代表性潜在聚类构建一致的潜在空间，解决SAM在图像序列分割中的语义不一致问题	foundation model	✅
5	Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling	CARS：通过自适应视频上下文建模实现流视频中的连续活动识别	embodied AI
6	Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation	提出Reflexive Guidance，提升视觉-语言模型在图像自适应概念生成中的OoDD检测能力	foundation model	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
7	Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion	提出Spatial-Mamba，通过结构感知状态融合有效建模视觉状态空间，提升图像理解能力。	Mamba SSM state space model	✅
8	LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound	LLaVA-Ultra：面向超声影像的中文多模态大语言模型	distillation large language model multimodal
9	Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step	提出SiDA：通过对抗蒸馏，单步超越教师模型的图像生成方法	distillation classifier-free guidance	✅
10	MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient Object Detection	提出MambaSOD，利用双Mamba驱动的跨模态融合网络解决RGB-D显著性目标检测问题。	Mamba	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
11	DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain	DCDepth：离散余弦域的渐进式单目深度估计方法	depth estimation monocular depth	✅
12	GL-NeRF: Gauss-Laguerre Quadrature Enables Training-Free NeRF Acceleration	提出GL-NeRF以解决NeRF体积渲染加速问题	NeRF neural radiance field
13	Neural Radiance Field Image Refinement through End-to-End Sampling Point Optimization	提出基于端到端采样点优化的NeRF图像优化方法，提升渲染质量。	NeRF neural radiance field
14	Part-Whole Relational Fusion Towards Multi-Modal Scene Understanding	提出Part-Whole Relational Fusion框架，解决多模态场景理解中模态融合难题。	scene understanding	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation	提出SLIC：通过压缩域水印保护的图像编解码器，防御图像篡改	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页