cs.CV(2024-07-29)

📊 共 21 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗3) 支柱八:物理动画 (Physics-based Animation) (2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning 提出可视化参考指令调优方法,提升多模态大语言模型在图表问答任务中的性能。 large language model multimodal
2 Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images 提出结合多模态大语言模型与街景图像的城市安全感知评估方法 large language model multimodal
3 Classification of freshwater snails of the genus Radomaniola with multimodal triplet networks 提出基于多模态Triplet网络的Radomaniola属淡水螺分类方法 multimodal
4 Twins-PainViT: Towards a Modality-Agnostic Vision Transformer Framework for Multimodal Automatic Pain Assessment using Facial Videos and fNIRS Twins-PainViT:面向面部视频和fNIRS的多模态疼痛自动评估的模态无关Vision Transformer框架 multimodal
5 Diffusion Feedback Helps CLIP See Better DIVA:利用扩散模型反馈提升CLIP的细粒度视觉能力 large language model multimodal
6 BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues 提出BRIDGE,通过增强视觉线索弥合图像描述评估中的差距 multimodal
7 Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing 提出SANE:利用LLM分解指令,解决文本驱动图像编辑中的歧义性问题 large language model
8 FlexAttention for Efficient High-Resolution Vision-Language Models FlexAttention:一种高效高分辨率视觉-语言模型注意力机制 multimodal
9 Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter 提出基于注意力机制的模态重加权方法AMR,提升RGB-骨骼动作识别模型的对抗鲁棒性。 multimodal
10 Synthetic Thermal and RGB Videos for Automatic Pain Assessment utilizing a Vision-MLP Architecture 提出基于GAN合成热成像视频的Vision-MLP疼痛自动评估方法 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
11 Garment Animation NeRF with Color Editing 提出Garment Animation NeRF,实现高质量服装动画神经渲染与颜色编辑 NeRF neural radiance field
12 MaskInversion: Localized Embeddings via Optimization of Explainability Maps 提出MaskInversion,通过优化可解释性图谱为图像局部区域生成上下文相关的嵌入表示。 open-vocabulary open vocabulary foundation model
13 Event-based Optical Flow on Neuromorphic Processor: ANN vs. SNN Comparison based on Activation Sparsification 提出基于激活稀疏化和神经形态处理器的事件光流算法,对比ANN与SNN的能效。 optical flow
14 Analysis and Improvement of Rank-Ordered Mean Algorithm in Single-Photon LiDAR 针对单光子LiDAR中ROM算法的局限性,提出改进的信号提取技术,显著提升深度估计性能。 depth estimation TAMP
15 Improving 2D Feature Representations by 3D-Aware Fine-Tuning 提出基于3D感知的微调方法FiT3D,提升2D特征表示能力,改善下游任务性能。 depth estimation foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
16 ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 ML-Mamba:利用Mamba-2的高效多模态大语言模型 Mamba state space model large language model
17 BaseBoostDepth: Exploiting Larger Baselines For Self-supervised Monocular Depth Estimation BaseBoostDepth:利用更大基线提升自监督单目深度估计精度 curriculum learning depth estimation monocular depth
18 Contextuality Helps Representation Learning for Generalized Category Discovery 提出基于上下文学习的广义类别发现方法,提升无标签数据集中类别识别与分类精度。 representation learning contrastive learning
19 SalNAS: Efficient Saliency-prediction Neural Architecture Search with self-knowledge distillation 提出SalNAS以解决显著性预测模型设计效率低下问题 distillation

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
20 SpaER: Learning Spatio-temporal Equivariant Representations for Fetal Brain Motion Tracking SpaER:学习时空等变表示,用于胎儿脑部运动追踪 motion tracking
21 FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention 提出FreeLong,一种免训练的长视频生成方法,通过频谱混合时间注意力提升视频质量。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页