cs.CV(2025-10-18)

📊 共 17 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 EDVD-LLaMA: Explainable Deepfake Video Detection via Multimodal Large Language Model Reasoning 提出EDVD-LLaMA框架,通过多模态大语言模型推理实现可解释的深度伪造视频检测。 large language model multimodal chain-of-thought
2 VisionSelector: End-to-End Learnable Visual Token Compression for Efficient Multimodal LLMs VisionSelector:端到端可学习的视觉Token压缩,提升多模态LLM效率 large language model multimodal
3 Universal and Transferable Attacks on Pathology Foundation Models 提出通用可迁移对抗扰动UTAP,揭示病理学Foundation模型的脆弱性 foundation model
4 PRISMM-Bench: A Benchmark of Peer-Review Grounded Multimodal Inconsistencies 提出PRISMM-Bench以解决多模态科学论文中的不一致性问题 multimodal
5 Structured Interfaces for Automated Reasoning with 3D Scene Graphs 提出基于结构化接口的3D场景图推理方法,提升LLM在机器人自然语言理解中的性能。 large language model instruction following
6 NavQ: Learning a Q-Model for Foresighted Vision-and-Language Navigation NavQ:学习Q-模型以实现具有前瞻性的视觉-语言导航 VLN
7 VIPAMIN: Visual Prompt Initialization via Embedding Selection and Subspace Expansion VIPAMIN:通过嵌入选择和子空间扩展实现视觉Prompt初始化,提升自监督模型在下游任务的性能。 foundation model
8 iWatchRoadv2: Pothole Detection, Geospatial Mapping, and Intelligent Road Governance iWatchRoadv2:提出基于YOLO的道路坑洼实时检测、地理空间映射与智能道路治理平台。 TAMP
9 Cerberus: Real-Time Video Anomaly Detection via Cascaded Vision-Language Models Cerberus:基于级联视觉-语言模型的实时视频异常检测系统 visual grounding

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
10 Self-Supervised Learning to Fly using Efficient Semantic Segmentation and Metric Depth Estimation for Low-Cost Autonomous UAVs 提出一种基于语义分割和单目深度估计的低成本无人机自主飞行方法 distillation depth estimation monocular depth
11 SSL4RL: Revisiting Self-supervised Learning as Intrinsic Reward for Visual-Language Reasoning SSL4RL:利用自监督学习作为视觉-语言推理的内在奖励,提升模型性能。 reinforcement learning large language model multimodal
12 HYDRA: HYbrid knowledge Distillation and spectral Reconstruction Algorithm for high channel hyperspectral camera applications 提出HYDRA,通过混合知识蒸馏和光谱重建算法,提升高通道高光谱相机应用性能。 distillation HSI
13 Instance-Aware Pseudo-Labeling and Class-Focused Contrastive Learning for Weakly Supervised Domain Adaptive Segmentation of Electron Microscopy 提出实例感知伪标签与类别聚焦对比学习,用于电镜图像弱监督域自适应分割 contrastive learning
14 RL makes MLLMs see better than SFT 提出PIVOT,通过强化学习优化MLLM视觉编码器,显著提升视觉感知能力。 reinforcement learning multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
15 REALM: An MLLM-Agent Framework for Open World 3D Reasoning Segmentation and Editing on Gaussian Splatting REALM:基于MLLM-Agent框架,实现高斯溅射上的开放世界3D推理分割与编辑 3D gaussian splatting gaussian splatting splatting
16 HGC-Avatar: Hierarchical Gaussian Compression for Streamable Dynamic 3D Avatars 提出HGC-Avatar,用于可传输的动态3D头像的高效高斯压缩。 3D gaussian splatting 3DGS gaussian splatting

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
17 Fit for Purpose? Deepfake Detection in the Real World 构建真实政治Deepfake基准,揭示现有检测器泛化能力不足 manipulation large language model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页