cs.CV(2024-09-28)
📊 共 14 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (5 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (3)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (2)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery | 提出CRISP对比预训练,融合地表和遥感图像提升自然世界图像表征学习。 | representation learning contrastive learning multimodal | ||
| 2 | CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | 提出CLIP-MoE以解决CLIP特征空间信息损失问题 | contrastive learning large language model multimodal | ||
| 3 | X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation | 提出X-Prompt框架,解决多模态视频目标分割中的通用性和数据稀缺问题 | MAE foundation model | ✅ | |
| 4 | Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion | 提出AIDFusion网络,融合多图谱信息并进行一致性蒸馏,提升脑网络分类性能 | distillation | ||
| 5 | Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection | 提出双空间表征学习方法,解决弱监督视频暴力检测中歧义暴力识别难题 | representation learning |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting | 提出基于高斯溅射的跨模态事件相机跟踪方法,解决动态和光照变化下的鲁棒定位问题。 | gaussian splatting splatting motion tracking | ||
| 7 | G3R: Gradient Guided Generalizable Reconstruction | G3R:梯度引导的可泛化重建,高效高质量地重建大规模场景 | 3DGS NeRF scene reconstruction | ||
| 8 | Fast Encoding and Decoding for Implicit Video Representation | 提出NeRV-Enc和NeRV-Dec以解决视频隐式表示编码解码速度问题 | implicit representation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models | FairPIVARA:通过消除特征偏差提升CLIP多模态模型的公平性 | multimodal | ✅ | |
| 10 | Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment | 提出基于冻结单模态编码器的多模态对齐框架,降低多模态模型开发成本。 | multimodal | ||
| 11 | TrojVLM: Backdoor Attack Against Vision Language Models | TrojVLM:针对视觉语言模型的后门攻击研究 | large language model multimodal |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | 1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024 | 提出多视角手部跟踪方法,结合数据增强与后处理,显著提升VR交互精度。 | egocentric | ||
| 13 | EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera | EEPNet:一种高效的基于边缘像素匹配的网络,用于激光雷达与相机之间的跨模态动态配准。 | feature matching |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | 1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction | 基于3DGS的双手-物体交互重建方法,解决无类别模板的单目视频重建难题 | manipulation bi-manual bimanual manipulation |