cs.CV(2025-07-04)

📊 共 23 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗3) 支柱一:机器人控制 (Robot Control) (2) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders 揭示多视觉编码器MLLM中的冗余性,提出利用率和信息差距指标进行诊断。 large language model multimodal
2 ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays ChestGPT:融合LLM与ViT的胸部X光疾病检测与定位框架 large language model
3 Sign Spotting Disambiguation using Large Language Models 提出一种基于大语言模型的无训练手语识别歧义消除框架,提升手语识别质量。 large language model
4 Dynamic Multimodal Prototype Learning in Vision-Language Models 提出ProtoMM,通过动态多模态原型学习提升视觉-语言模型在测试时自适应的性能。 multimodal
5 Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation Causal-SAM-LLM:利用大语言模型进行因果推理,提升医学分割的鲁棒性 large language model
6 Multimodal Alignment with Cross-Attentive GRUs for Fine-Grained Video Understanding 提出基于跨注意力GRU的多模态对齐框架,用于细粒度视频理解 multimodal
7 MolVision: Molecular Property Prediction with Vision Language Models MolVision:利用视觉语言模型进行分子性质预测,提升预测性能和泛化能力。 large language model multimodal
8 Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor 提出全局对齐和CLIP相似度指标,用于评估视觉描述符的质量,超越传统准确率。 foundation model
9 Unlearning the Noisy Correspondence Makes CLIP More Robust 提出NCU框架,通过解耦噪声关联提升CLIP模型的鲁棒性 zero-shot transfer

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
10 FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed FastDINOv2:基于频率的课程学习加速DINOv2预训练并提升鲁棒性 curriculum learning foundation model
11 Subject Invariant Contrastive Learning for Human Activity Recognition 提出主题不变对比学习(SICL)以提升人体活动识别的泛化能力 contrastive learning multimodal
12 Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach 提出分解与描述方法(D&D),提升CLIP对局部语义的感知能力 contrastive learning large language model
13 Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling 提出基于难度引导采样的任务特定生成数据集蒸馏方法,提升分类任务性能。 distillation
14 Source-Free Domain Adaptation via Multi-view Contrastive Learning 提出基于多视角对比学习的无源域自适应方法,提升伪标签质量。 contrastive learning
15 Dual-frequency Selected Knowledge Distillation with Statistical-based Sample Rectification for PolSAR Image Classification 提出SKDNet-SSR,通过统计校正和知识蒸馏提升双频PolSAR图像分类精度。 distillation
16 StreamDiT: Real-Time Streaming Text-to-Video Generation StreamDiT:提出一种基于流式扩散模型的实时文本到视频生成方法,实现512p分辨率下的16 FPS。 flow matching distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
17 Open-Vocabulary Object Detection in UAV Imagery: A Review and Future Perspectives 综述:无人机影像开放词汇目标检测方法,分析挑战与展望未来 scene understanding open-vocabulary open vocabulary
18 Leveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model 提出SemiOVS框架,利用开放词汇模型有效提升半监督语义分割在OOD数据上的性能。 open-vocabulary open vocabulary
19 Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps S3PO-GS:基于全局尺度一致3D高斯点图的室外单目SLAM 3D gaussian splatting 3DGS gaussian splatting
20 Radar Velocity Transformer: Single-scan Moving Object Segmentation in Noisy Radar Point Clouds 提出Radar Velocity Transformer,用于在雷达点云中进行单帧移动物体分割。 scene understanding

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
21 Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations 提出基于统一表征的多模态领域泛化方法,解决跨模态泛化方向不一致问题 manipulation representation learning multimodal
22 SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts SecureT2I:防止扩散模型生成图像的未经授权编辑 manipulation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
23 ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization ConceptMix++通过迭代提示优化,提升文本到图像生成模型的公平基准测试。 spatial relationship multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页