cs.CV(2025-04-29)

📊 共 19 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱四:生成式动作 (Generative Motion) (3) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks 结合多模态LLM与CNN,实现高精度植物叶片病害自动检测 large language model multimodal
2 X-Fusion: Introducing New Modality to Frozen Large Language Models X-Fusion:为冻结的大语言模型引入新模态,提升多模态任务性能 large language model multimodal
3 FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models FedMVP:联邦多模态视觉提示调优,提升视觉-语言模型泛化性 multimodal
4 CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation 提出CMT框架以解决多模态CAD生成问题 multimodal
5 LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs 提出LMME3DHF,基于LMM评估AI生成3D人脸质量与真实性,并构建大规模基准Gen3DHF。 multimodal
6 LymphAtlas- A Unified Multimodal Lymphoma Imaging Repository Delivering AI-Enhanced Diagnostic Insight 构建LymphAtlas淋巴瘤多模态影像数据集,实现AI增强的诊断洞察 multimodal
7 Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers 提出Classifier-to-Bias (C2B),实现视觉分类器无监督自动偏见检测。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
8 Large-scale visual SLAM for in-the-wild videos 提出一种鲁棒的视觉SLAM系统,用于重建非结构化场景下的在线视频。 predictive model visual odometry visual SLAM
9 MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification 提出MambaMoE,利用混合专家模型进行高光谱图像分类,提升精度和效率。 Mamba state space model HSI
10 SAM-Guided Robust Representation Learning for One-Shot 3D Medical Image Segmentation 提出RRL-MedSAM框架,利用SAM提升单样本3D医学图像分割性能 representation learning distillation foundation model
11 DS_FusionNet: Dynamic Dual-Stream Fusion with Bidirectional Knowledge Distillation for Plant Disease Recognition DS_FusionNet:动态双流融合与双向知识蒸馏用于植物病害识别 distillation

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
12 AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation AlignDiT:多模态对齐扩散Transformer用于同步语音生成 classifier-free guidance multimodal
13 Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion 提出基于扩散模型的面部动作生成方法,高效合成对话场景中听者的面部表情。 motion synthesis
14 Floating Car Observers in Intelligent Transportation Systems: Detection Modeling and Temporal Insights 提出基于浮动车观测器的智能交通系统车辆检测建模与时序分析方法 penetration

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
15 GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion GaussTrap:针对3D高斯溅射的隐蔽投毒攻击,实现定向场景混淆 3D gaussian splatting 3DGS gaussian splatting
16 EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian EfficientHuman:利用可变形2D高斯快速训练和重建运动人体 3D gaussian splatting 3DGS gaussian splatting

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
17 MemeBLIP2: A novel lightweight multimodal system to detect harmful memes 提出MemeBLIP2轻量级多模态系统,用于检测有害Meme内容 HuMoR multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
18 TesserAct: Learning 4D Embodied World Models TesserAct:学习具身智能体的4D世界模型,实现时空一致的场景预测。 manipulation policy learning world model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
19 PartHOI: Part-based Hand-Object Interaction Transfer via Generalized Cylinders PartHOI:利用广义柱体实现基于部件的手-物交互迁移 HOI

⬅️ 返回 cs.CV 首页 · 🏠 返回主页