cs.CV（2024-09-28）

📊 共 14 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (5 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (3) 支柱九：具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (2) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery	提出CRISP对比预训练，融合地表和遥感图像提升自然世界图像表征学习。	representation learning contrastive learning multimodal
2	CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling	提出CLIP-MoE以解决CLIP特征空间信息损失问题	contrastive learning large language model multimodal
3	X-Prompt: Multi-modal Visual Prompt for Video Object Segmentation	提出X-Prompt框架，解决多模态视频目标分割中的通用性和数据稀缺问题	MAE foundation model	✅
4	Multi-Atlas Brain Network Classification through Consistency Distillation and Complementary Information Fusion	提出AIDFusion网络，融合多图谱信息并进行一致性蒸馏，提升脑网络分类性能	distillation
5	Beyond Euclidean: Dual-Space Representation Learning for Weakly Supervised Video Violence Detection	提出双空间表征学习方法，解决弱监督视频暴力检测中歧义暴力识别难题	representation learning

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
6	GS-EVT: Cross-Modal Event Camera Tracking based on Gaussian Splatting	提出基于高斯溅射的跨模态事件相机跟踪方法，解决动态和光照变化下的鲁棒定位问题。	gaussian splatting splatting motion tracking
7	G3R: Gradient Guided Generalizable Reconstruction	G3R：梯度引导的可泛化重建，高效高质量地重建大规模场景	3DGS NeRF scene reconstruction
8	Fast Encoding and Decoding for Implicit Video Representation	提出NeRV-Enc和NeRV-Dec以解决视频隐式表示编码解码速度问题	implicit representation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
9	FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models	FairPIVARA：通过消除特征偏差提升CLIP多模态模型的公平性	multimodal	✅
10	Harnessing Frozen Unimodal Encoders for Flexible Multimodal Alignment	提出基于冻结单模态编码器的多模态对齐框架，降低多模态模型开发成本。	multimodal
11	TrojVLM: Backdoor Attack Against Vision Language Models	TrojVLM：针对视觉语言模型的后门攻击研究	large language model multimodal

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
12	1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024	提出多视角手部跟踪方法，结合数据增强与后处理，显著提升VR交互精度。	egocentric
13	EEPNet: Efficient Edge Pixel-based Matching Network for Cross-Modal Dynamic Registration between LiDAR and Camera	EEPNet：一种高效的基于边缘像素匹配的网络，用于激光雷达与相机之间的跨模态动态配准。	feature matching

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
14	1st Place Solution to the 8th HANDS Workshop Challenge -- ARCTIC Track: 3DGS-based Bimanual Category-agnostic Interaction Reconstruction	基于3DGS的双手-物体交互重建方法，解决无类别模板的单目视频重建难题	manipulation bi-manual bimanual manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页