cs.CV(2024-09-28)

📊 共 14 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
1 Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery 提出CRISP对比预训练,融合地表和遥感图像提升自然世界图像表征学习。 representation learning contrastive learning multimodal
2 CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling 提出CLIP-MoE以解决CLIP特征空间信息损失问题 contrastive learning large language model multimodal
3 X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation 提出X-Prompt框架,解决多模态视频目标分割中的通用性和数据稀缺问题 MAE foundation model
4 Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion 提出AIDFusion网络,融合多图谱信息并进行一致性蒸馏,提升脑网络分类性能 distillation
5 Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection 提出双空间表征学习方法,解决弱监督视频暴力检测中歧义暴力识别难题 representation learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
6 GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting 提出基于高斯溅射的跨模态事件相机跟踪方法,解决动态和光照变化下的鲁棒定位问题。 gaussian splatting splatting motion tracking
7 G3R: Gradient Guided Generalizable Reconstruction G3R:梯度引导的可泛化重建,高效高质量地重建大规模场景 3DGS NeRF scene reconstruction
8 Fast Encoding and Decoding for Implicit Video Representation 提出NeRV-Enc和NeRV-Dec以解决视频隐式表示编码解码速度问题 implicit representation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
9 FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models FairPIVARA:通过消除特征偏差提升CLIP多模态模型的公平性 multimodal
10 Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment 提出基于冻结单模态编码器的多模态对齐框架,降低多模态模型开发成本。 multimodal
11 TrojVLM: Backdoor Attack Against Vision Language Models TrojVLM:针对视觉语言模型的后门攻击研究 large language model multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
12 1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024 提出多视角手部跟踪方法,结合数据增强与后处理,显著提升VR交互精度。 egocentric
13 EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera EEPNet:一种高效的基于边缘像素匹配的网络,用于激光雷达与相机之间的跨模态动态配准。 feature matching

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
14 1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction 基于3DGS的双手-物体交互重建方法,解决无类别模板的单目视频重建难题 manipulation bi-manual bimanual manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页