cs.CV（2024-06-25）

📊 共 17 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (8 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱一：机器人控制 (Robot Control) (2 🔗2) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱三：空间感知与语义 (Perception & Semantics) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Semi-supervised classification of dental conditions in panoramic radiographs using large language model and instance segmentation: A real-world dataset evaluation	提出基于大语言模型和实例分割的半监督学习框架，用于全景牙科X光片中牙齿状况的分类。	masked autoencoder large language model
2	MAGIC: Meta-Ability Guided Interactive Chain-of-Distillation for Effective-and-Efficient Vision-and-Language Navigation	提出MAGIC：元能力引导的交互式链式蒸馏，用于高效的视觉-语言导航	teacher-student distillation VLN	✅
3	Pamba: Enhancing Global Interaction in Point Clouds via State Space Model	提出Pamba，利用状态空间模型增强点云全局交互，实现高效语义分割。	Mamba SSM state space model
4	Highly Constrained Coded Aperture Imaging Systems Design Via a Knowledge Distillation Approach	提出基于知识蒸馏的编码孔径成像系统设计方法，解决物理约束下的性能优化问题	teacher-student distillation
5	Pseudo Labelling for Enhanced Masked Autoencoders	提出基于伪标签的增强型掩码自编码器，提升图像表征学习能力	masked autoencoder MAE
6	Towards Optimal Trade-offs in Knowledge Distillation for CNNs and Vision Transformers at the Edge	面向边缘设备，研究CNN与ViT知识蒸馏的最优权衡策略	distillation
7	Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition	提出基于自知识蒸馏的三流时序注意力网络用于提升微表情识别性能	distillation	✅
8	Video Occupancy Models	提出视频占用模型以支持控制任务的预测	world model predictive model	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Tell Me Where You Are: Multimodal LLMs Meet Place Recognition	提出基于多模态LLM的视觉定位方法，提升机器人定位精度	large language model foundation model multimodal
10	MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization	提出MSRS稀疏掩码优化方法，从头训练高效多模态语音识别模型	multimodal
11	MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning	MG-LLaVA：面向多粒度视觉指令调优的多模态大语言模型	large language model multimodal	✅
12	MLLM as Video Narrator: Mitigating Modality Imbalance in Video Moment Retrieval	提出基于MLLM的视频叙述方法，缓解视频时刻检索中的模态不平衡问题	large language model
13	Point-SAM: Promptable 3D Segmentation Model for Point Clouds	提出Point-SAM：一种面向点云的可Prompt的三维分割模型	foundation model	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Video Inpainting Localization with Contrastive Learning	提出基于对比学习的视频修复区域定位方法ViLocal，用于检测伪造视频。	manipulation contrastive learning spatiotemporal	✅
15	MotionBooth: Motion-Aware Customized Text-to-Video Generation	MotionBooth：运动感知的可定制文本到视频生成框架	manipulation	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Local-to-Global Cross-Modal Attention-Aware Fusion for HSI-X Semantic Segmentation	提出LoGoCAF框架，用于解决高光谱图像与X模态数据融合的语义分割难题。	HSI multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes	提出Implicit-Zoo大规模数据集，促进神经隐函数在2D图像和3D场景中的研究与应用。	NeRF

⬅️ 返回 cs.CV 首页 · 🏠 返回主页