cs.CV(2025-09-21)

📊 共 20 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Automated Facility Enumeration for Building Compliance Checking using Door Detection and Large Language Models 提出基于门检测与大语言模型的自动化设施枚举方法,用于建筑合规性检查 large language model chain-of-thought
2 AgriDoctor: A Multimodal Intelligent Assistant for Agriculture 提出AgriDoctor多模态智能助手,解决农业领域作物病害诊断难题。 large language model multimodal
3 Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models 提出多模态提示解耦攻击MPDA,提升文本到图像模型越狱攻击的成功率。 large language model multimodal
4 SAEC: Scene-Aware Enhanced Edge-Cloud Collaborative Industrial Vision Inspection with Multimodal LLM 提出SAEC框架,利用多模态LLM实现场景感知的边缘-云协同工业视觉检测。 multimodal
5 Parameter-efficient fine-tuning (PEFT) of Vision Foundation Models for Atypical Mitotic Figure Classification 利用参数高效微调的视觉基础模型进行非典型有丝分裂图像分类 foundation model
6 Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception 提出自蒸馏RoI预测网络,提升MLLM细粒度感知能力,无需大规模标注。 large language model multimodal
7 MDF-MLLM: Deep Fusion Through Cross-Modal Feature Alignment for Contextually Aware Fundoscopic Image Classification 提出MDF-MLLM以解决视网膜图像分类准确性不足问题 large language model multimodal
8 The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA 提出SaSaSa2VA以解决视频物体分割中的关键瓶颈问题 large language model
9 ISCS: Parameter-Guided Feature Pruning for Resource-Constrained Embodied Perception 提出ISCS:一种参数引导的特征剪枝方法,用于资源受限的具身感知。 embodied AI

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
10 ConfidentSplat: Confidence-Weighted Depth Fusion for Accurate 3D Gaussian Splatting SLAM ConfidentSplat:置信度加权深度融合的精确3D高斯溅射SLAM visual SLAM depth estimation 3D gaussian splatting
11 SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views SPFSplatV2:基于稀疏视角的自监督无姿态3D高斯溅射 3D gaussian splatting gaussian splatting splatting
12 Informative Text-Image Alignment for Visual Affordance Learning with Foundation Models 提出信息对齐框架,利用基础模型提升文本引导的视觉可供性学习。 affordance foundation model
13 DT-NeRF: A Diffusion and Transformer-Based Optimization Approach for Neural Radiance Fields in 3D Reconstruction 提出基于扩散模型与Transformer的DT-NeRF,提升3D重建细节与多视角一致性。 NeRF neural radiance field scene reconstruction
14 Efficient 3D Scene Reconstruction and Simulation from Sparse Endoscopic Views 提出基于高斯溅射的内窥镜手术场景高效重建与交互仿真框架 gaussian splatting splatting scene reconstruction
15 HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis 提出HyRF,结合显式高斯和神经场,实现内存高效、高质量的新视角合成。 3D gaussian splatting 3DGS gaussian splatting

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
16 ME-Mamba: Multi-Expert Mamba with Efficient Knowledge Capture and Fusion for Multimodal Survival Analysis 提出ME-Mamba,用于高效融合病理图像和基因组数据的多模态生存分析。 Mamba multimodal
17 VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery 提出VaseVQA基准和VaseVL模型,用于提升多模态大模型在古希腊陶器领域的专家级理解能力。 reinforcement learning multimodal
18 From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning 提出MIR基准测试,用于评估多图交错推理中多模态大语言模型的能力。 curriculum learning large language model
19 PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion PRISM:通过生成扩散模型实现精确召回指导的无数据知识蒸馏 distillation
20 Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning 提出CoMTIP框架,用于空间转录组学中基于对比Masked Text-Image预训练的表征学习。 representation learning

⬅️ 返回 cs.CV 首页 · 🏠 返回主页