cs.CV(2025-10-26)

📊 共 20 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱二:RL算法与架构 (RL & Architecture) (4)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 Windsock is Dancing: Adaptive Multimodal Retrieval-Augmented Generation Windsock:自适应多模态检索增强生成,提升MLLM响应质量与效率 large language model multimodal
2 SARVLM: A Vision Language Foundation Model for Semantic Understanding and Target Recognition in SAR Imagery 提出SARVLM:面向SAR图像语义理解和目标识别的视觉语言基础模型 foundation model multimodal
3 DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection 构建多模态深度伪造检测基准DeepfakeBench-MM,应对伪造音视频内容带来的社会风险。 multimodal
4 Open Multimodal Retrieval-Augmented Factual Image Generation 提出ORIG框架,通过开放多模态检索增强,解决事实性图像生成中知识不准确问题。 multimodal
5 GateFuseNet: An Adaptive 3D Multimodal Neuroimaging Fusion Network for Parkinson's Disease Diagnosis 提出GateFuseNet,利用自适应3D多模态融合网络辅助帕金森病诊断。 multimodal
6 FairJudge: MLLM Judging for Social Attributes and Prompt Image Alignment FairJudge:利用多模态LLM评估图像生成模型在社会属性和提示对齐方面的公平性 multimodal instruction following
7 Mitigating Attention Sinks and Massive Activations in Audio-Visual Speech Recognition with LLMs 针对LLM在AVSR中Attention Sink和激活值过大问题,提出解耦损失函数。 large language model multimodal
8 LLM-based Fusion of Multi-modal Features for Commercial Memorability Prediction 提出基于LLM的多模态融合方法,用于提升商业广告记忆度预测的鲁棒性和泛化性。 multimodal
9 VADTree: Explainable Training-Free Video Anomaly Detection via Hierarchical Granularity-Aware Tree 提出VADTree,通过层级粒度感知树实现可解释的无训练视频异常检测。 large language model
10 RoboSVG: A Unified Framework for Interactive SVG Generation with Multi-modal Guidance RoboSVG:多模态引导的交互式SVG统一生成框架 multimodal
11 PSScreen V2: Partially Supervised Multiple Retinal Disease Screening PSScreen V2:提出一种半监督自训练框架,用于多视网膜疾病筛查。 foundation model
12 STATUS Bench: A Rigorous Benchmark for Evaluating Object State Understanding in Vision-Language Models STATUS Bench:用于评估视觉-语言模型对象状态理解能力的严格基准 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
13 LVD-GS: Gaussian Splatting SLAM for Dynamic Scenes via Hierarchical Explicit-Implicit Representation Collaboration Rendering LVD-GS:提出基于分层显隐式表达协作渲染的动态场景高斯溅射SLAM系统 3D gaussian splatting gaussian splatting splatting
14 Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views 提出Look and Tell数据集,用于研究第一人称和第三人称视角下的多模态指示性交流。 scene reconstruction egocentric multimodal
15 DynaPose4D: High-Quality 4D Dynamic Content Generation via Pose Alignment Loss DynaPose4D:通过姿态对齐损失生成高质量4D动态内容 3D gaussian splatting gaussian splatting splatting
16 Seeing the Unseen: Towards Zero-Shot Inspection for Wind Turbine Blades using Knowledge-Augmented Vision Language Models 提出基于知识增强视觉语言模型的零样本风力涡轮机叶片缺陷检测方法 open-vocabulary open vocabulary multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
17 Edge Collaborative Gaussian Splatting with Integrated Rendering and Communication 提出ECO-GS,通过边缘协同高斯溅射提升低成本设备渲染质量 imitation learning gaussian splatting splatting
18 Mutual Information guided Visual Contrastive Learning 提出互信息引导的视觉对比学习,提升表征在开放环境下的泛化性 representation learning contrastive learning
19 Alias-Free ViT: Fractional Shift Invariance via Linear Attention 提出Alias-Free ViT,通过线性注意力实现分数平移不变性,提升ViT的鲁棒性。 linear attention
20 Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity 提出基于单教师视角增强的知识蒸馏方法,通过角度多样性提升学生模型性能。 distillation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页