cs.CV(2024-08-02)

📊 共 19 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (11 🔗6) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
1 Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement 提出Wave-Mamba,利用小波变换和状态空间模型进行超高清低光图像增强。 Mamba SSM state space model
2 Spatial and Spatial-Spectral Morphological Mamba for Hyperspectral Image Classification 提出形态空间与空间-光谱Mamba模型以提高高光谱图像分类效率 Mamba state space model HSI
3 Multi-head Spatial-Spectral Mamba for Hyperspectral Image Classification 提出多头空谱Mamba模型(MHSSMamba)用于高光谱图像分类,提升精度。 Mamba SSM HSI
4 NOLO: Navigate Only Look Once NOLO:仅观察一次即可导航,利用Transformer上下文学习能力解决视频导航问题 reinforcement learning offline reinforcement learning optical flow
5 WaveMamba: Spatial-Spectral Wavelet Mamba for Hyperspectral Image Classification WaveMamba:用于高光谱图像分类的空谱小波Mamba模型 Mamba HSI
6 PhysMamba: State Space Duality Model for Remote Physiological Measurement PhysMamba:提出基于状态空间对偶的远程生理测量模型,提升噪声环境下的鲁棒性。 Mamba state space model
7 MambaST: A Plug-and-Play Cross-Spectral Spatial-Temporal Fuser for Efficient Pedestrian Detection 提出MambaST,一种即插即用的跨光谱时空融合框架,用于高效行人检测 Mamba state space model
8 POA: Pre-training Once for Models of All Sizes 提出POA:一次预训练得到各种尺寸的模型,解决部署难题。 representation learning distillation foundation model
9 Balanced Residual Distillation Learning for 3D Point Cloud Class-Incremental Semantic Segmentation 提出平衡残差蒸馏学习框架,解决3D点云增量语义分割中的灾难性遗忘和类别偏差问题。 distillation
10 A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness 提出一种通用框架,通过词汇丰富度提升文本到3D生成中3D高斯初始化的质量。 dreamer splatting
11 Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning 提出基于预训练文本编码器语义知识的持续学习方法,提升模型知识保留能力。 representation learning distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
12 Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed Inputs 提出Hallu-PI基准,评估多模态大模型在扰动输入下的幻觉问题 large language model
13 StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation 提出StitchFusion,通过编织任意视觉模态增强多模态语义分割 multimodal
14 An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding 提出基于Transformer Decoder的高效多任务视觉定位框架,解决计算成本过高问题。 visual grounding
15 Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model 利用视觉基础模型的像素级监督提升注视对象预测性能 foundation model
16 SceneMotion: From Agent-Centric Embeddings to Scene-Wide Forecasts SceneMotion:利用Agent-Centric嵌入实现场景范围内的运动预测 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
17 IG-SLAM: Instant Gaussian SLAM IG-SLAM:基于高斯溅射的即时SLAM,提升RGB-D SLAM速度与精度 3D gaussian splatting gaussian splatting splatting
18 Embodiment: Self-Supervised Depth Estimation Based on Camera Models 提出基于相机模型的自监督深度估计方法,提升单目深度估计性能。 depth estimation monocular depth

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
19 Multi-task SAR Image Processing via GAN-based Unsupervised Manipulation 提出基于GAN的无监督编辑框架GUE,用于多任务SAR图像处理。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页