cs.CV(2024-08-07)

📊 共 22 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation 提出QSLAW:一种量化感知缩放学习方法,用于高效适应多模态大语言模型 large language model multimodal
2 MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video MMSummary:提出用于胎儿超声视频的多模态摘要生成系统,提升临床工作效率。 large language model multimodal
3 PaveCap: The First Multimodal Framework for Comprehensive Pavement Condition Assessment with Dense Captioning and PCI Estimation PaveCap:首个多模态路面状况综合评估框架,实现密集描述和PCI估计 multimodal
4 AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging 提出AgentsCoMerge,利用大语言模型赋能匝道汇流场景下的协同决策 large language model
5 Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling Openstory++:大规模实例感知开放域视觉故事数据集与评测基准 large language model multimodal
6 MoExtend: Tuning New Experts for Modality and Task Extension MoExtend:通过调优新专家模块实现多模态和任务扩展 large language model multimodal
7 Handwritten Code Recognition for Pen-and-Paper CS Education 提出结合OCR、缩进识别与语言模型的手写代码识别方法,提升手写CS教育体验。 multimodal
8 Task-oriented Sequential Grounding and Navigation in 3D Scenes 提出SG3D数据集和SG-LLM模型,用于解决3D场景中面向任务的序列化定位与导航问题。 visual grounding
9 How Well Can Vision Language Models See Image Details? 提出像素值预测任务,提升视觉语言模型对图像细节的感知能力 large language model
10 AdapMTL: Adaptive Pruning Framework for Multitask Learning Model AdapMTL:面向多任务学习模型自适应剪枝框架,提升模型效率与精度。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
11 Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning EMBED:利用外视视频-语言数据提升第一人称视频表征学习 representation learning egocentric
12 PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model PoseMamba:利用双向全局-局部时空状态空间模型进行单目3D人体姿态估计 Mamba SSM state space model
13 FOVAL: Calibration-Free and Subject-Invariant Fixation Depth Estimation Across Diverse Eye-Tracking Datasets 提出FOVAL,实现无需校准且主体无关的注视深度估计,适用于多种眼动追踪数据集。 MAE depth estimation spatiotemporal
14 FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks 提出FacialPulse,利用时序人脸关键点高效检测抑郁症 MAE PULSE
15 FMiFood: Multi-modal Contrastive Learning for Food Image Classification 提出FMiFood多模态对比学习框架,提升食物图像分类精度。 contrastive learning
16 Dual-Modeling Decouple Distillation for Unsupervised Anomaly Detection 提出双模型解耦蒸馏方法以解决无监督异常检测问题 distillation
17 Weakly Contrastive Learning via Batch Instance Discrimination and Feature Clustering for Small Sample SAR ATR 提出基于批量实例判别和特征聚类的弱对比学习框架,解决小样本SAR ATR问题。 contrastive learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
18 Compact 3D Gaussian Splatting for Static and Dynamic Radiance Fields 提出紧凑型3D高斯溅射,用于静态和动态辐射场的压缩与加速。 3D gaussian splatting 3DGS gaussian splatting
19 3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting 提出3iGS,通过分解张量光照提升3D高斯溅射的渲染质量,尤其改善视角相关的镜面反射效果。 3D gaussian splatting 3DGS gaussian splatting
20 Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian Query3D:利用LLM驱动的语言嵌入3D高斯进行开放词汇场景分割 open-vocabulary open vocabulary large language model
21 PRTGS: Precomputed Radiance Transfer of Gaussian Splats for Real-Time High-Quality Relighting PRTGS:预计算高斯溅射辐射传输,实现实时高质量动态光照效果 3D gaussian splatting 3DGS gaussian splatting

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
22 TALE: Training-free Cross-domain Image Composition via Adaptive Latent Manipulation and Energy-guided Optimization 提出TALE,一种免训练的跨域图像合成框架,通过自适应潜在空间操作和能量引导优化实现。 manipulation latent optimization

⬅️ 返回 cs.CV 首页 · 🏠 返回主页