cs.CV（2025-09-21）

📊 共 20 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (9 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Automated Facility Enumeration for Building Compliance Checking using Door Detection and Large Language Models	提出基于门检测与大语言模型的自动化设施枚举方法，用于建筑合规性检查	large language model chain-of-thought
2	AgriDoctor: A Multimodal Intelligent Assistant for Agriculture	提出AgriDoctor多模态智能助手，解决农业领域作物病害诊断难题。	large language model multimodal
3	Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models	提出多模态提示解耦攻击MPDA，提升文本到图像模型越狱攻击的成功率。	large language model multimodal
4	SAEC: Scene-Aware Enhanced Edge-Cloud Collaborative Industrial Vision Inspection with Multimodal LLM	提出SAEC框架，利用多模态LLM实现场景感知的边缘-云协同工业视觉检测。	multimodal	✅
5	Parameter-efficient fine-tuning (PEFT) of Vision Foundation Models for Atypical Mitotic Figure Classification	利用参数高效微调的视觉基础模型进行非典型有丝分裂图像分类	foundation model
6	Catching the Details: Self-Distilled RoI Predictors for Fine-Grained MLLM Perception	提出自蒸馏RoI预测网络，提升MLLM细粒度感知能力，无需大规模标注。	large language model multimodal	✅
7	MDF-MLLM: Deep Fusion Through Cross-Modal Feature Alignment for Contextually Aware Fundoscopic Image Classification	提出MDF-MLLM以解决视网膜图像分类准确性不足问题	large language model multimodal
8	The 1st Solution for 7th LSVOS RVOS Track: SaSaSa2VA	提出SaSaSa2VA以解决视频物体分割中的关键瓶颈问题	large language model	✅
9	ISCS: Parameter-Guided Feature Pruning for Resource-Constrained Embodied Perception	提出ISCS：一种参数引导的特征剪枝方法，用于资源受限的具身感知。	embodied AI

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
10	ConfidentSplat: Confidence-Weighted Depth Fusion for Accurate 3D Gaussian Splatting SLAM	ConfidentSplat：置信度加权深度融合的精确3D高斯溅射SLAM	visual SLAM depth estimation 3D gaussian splatting
11	SPFSplatV2: Efficient Self-Supervised Pose-Free 3D Gaussian Splatting from Sparse Views	SPFSplatV2：基于稀疏视角的自监督无姿态3D高斯溅射	3D gaussian splatting gaussian splatting splatting	✅
12	Informative Text-Image Alignment for Visual Affordance Learning with Foundation Models	提出信息对齐框架，利用基础模型提升文本引导的视觉可供性学习。	affordance foundation model
13	DT-NeRF: A Diffusion and Transformer-Based Optimization Approach for Neural Radiance Fields in 3D Reconstruction	提出基于扩散模型与Transformer的DT-NeRF，提升3D重建细节与多视角一致性。	NeRF neural radiance field scene reconstruction
14	Efficient 3D Scene Reconstruction and Simulation from Sparse Endoscopic Views	提出基于高斯溅射的内窥镜手术场景高效重建与交互仿真框架	gaussian splatting splatting scene reconstruction
15	HyRF: Hybrid Radiance Fields for Memory-efficient and High-quality Novel View Synthesis	提出HyRF，结合显式高斯和神经场，实现内存高效、高质量的新视角合成。	3D gaussian splatting 3DGS gaussian splatting	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
16	ME-Mamba: Multi-Expert Mamba with Efficient Knowledge Capture and Fusion for Multimodal Survival Analysis	提出ME-Mamba，用于高效融合病理图像和基因组数据的多模态生存分析。	Mamba multimodal
17	VaseVQA: Multimodal Agent and Benchmark for Ancient Greek Pottery	提出VaseVQA基准和VaseVL模型，用于提升多模态大模型在古希腊陶器领域的专家级理解能力。	reinforcement learning multimodal	✅
18	From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning	提出MIR基准测试，用于评估多图交错推理中多模态大语言模型的能力。	curriculum learning large language model
19	PRISM: Precision-Recall Informed Data-Free Knowledge Distillation via Generative Diffusion	PRISM：通过生成扩散模型实现精确召回指导的无数据知识蒸馏	distillation
20	Learning from Gene Names, Expression Values and Images: Contrastive Masked Text-Image Pretraining for Spatial Transcriptomics Representation Learning	提出CoMTIP框架，用于空间转录组学中基于对比Masked Text-Image预训练的表征学习。	representation learning

⬅️ 返回 cs.CV 首页 · 🏠 返回主页