cs.CV(2024-10-13)

📊 共 17 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 LongHalQA: Long-Context Hallucination Evaluation for MultiModal Large Language Models 提出LongHalQA,用于评估多模态大语言模型在长文本场景下的幻觉问题 large language model multimodal
2 MIRAGE: Multimodal Identification and Recognition of Annotations in Indian General Prescriptions MIRAGE:利用多模态大模型识别印度通用处方中的手写体标注 large language model multimodal
3 Data Adaptive Few-shot Multi Label Segmentation with Foundation Model 提出基于Foundation Model的数据自适应少样本多标签分割方法,提升医学图像分割性能。 foundation model
4 LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models 提出LOKI:一个使用大型多模态模型进行综合性合成数据检测的基准。 multimodal
5 Text4Seg: Reimagining Image Segmentation as Text Generation Text4Seg:将图像分割重构为文本生成任务,简化分割流程。 large language model multimodal
6 Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models Surgical-LLaVA:通过大型语言和视觉模型实现手术场景理解 large language model instruction following
7 UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation 提出UnSeg,利用通用不可学习噪声生成器对抗图像分割模型 foundation model
8 Robust 3D Point Clouds Classification based on Declarative Defenders 提出基于声明式防御的鲁棒3D点云分类方法,提升对抗攻击下的性能。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
9 t-READi: Transformer-Powered Robust and Efficient Multimodal Inference for Autonomous Driving t-READi:用于自动驾驶的Transformer驱动的鲁棒高效多模态推理 contrastive learning multimodal
10 ViFi-ReID: A Two-Stream Vision-WiFi Multimodal Approach for Person Re-identification 提出ViFi-ReID:双流视觉-WiFi多模态行人重识别方法 contrastive learning multimodal
11 Large Model for Small Data: Foundation Model for Cross-Modal RF Human Activity Recognition 提出FM-Fi框架,利用视觉基础模型提升小样本射频人体活动识别性能 distillation foundation model
12 SlimSeiz: Efficient Channel-Adaptive Seizure Prediction Using a Mamba-Enhanced Network SlimSeiz:利用Mamba增强网络实现高效的通道自适应癫痫预测 Mamba
13 SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data SynFER:通过合成数据提升面部表情识别性能 representation learning foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
14 Magnituder Layers for Implicit Neural Representations in 3D 提出 Magnituder 层,减少3D隐式神经表示参数量,提升推理速度。 NeRF neural radiance field scene reconstruction

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
15 InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling InterMask:通过协同掩码建模生成逼真3D人际互动 VQ-VAE

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
16 EITNet: An IoT-Enhanced Framework for Real-Time Basketball Action Recognition EITNet:一种用于实时篮球动作识别的物联网增强框架 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
17 EBDM: Exemplar-guided Image Translation with Brownian-bridge Diffusion Models 提出EBDM,利用布朗桥扩散模型实现示例引导的图像转换 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页