cs.CV（2025-07-04）

📊 共 23 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (9 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (7 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (4 🔗3) 支柱一：机器人控制 (Robot Control) (2) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders	揭示多视觉编码器MLLM中的冗余性，提出利用率和信息差距指标进行诊断。	large language model multimodal
2	ChestGPT: Integrating Large Language Models and Vision Transformers for Disease Detection and Localization in Chest X-Rays	ChestGPT：融合LLM与ViT的胸部X光疾病检测与定位框架	large language model
3	Sign Spotting Disambiguation using Large Language Models	提出一种基于大语言模型的无训练手语识别歧义消除框架，提升手语识别质量。	large language model
4	Dynamic Multimodal Prototype Learning in Vision-Language Models	提出ProtoMM，通过动态多模态原型学习提升视觉-语言模型在测试时自适应的性能。	multimodal
5	Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation	Causal-SAM-LLM：利用大语言模型进行因果推理，提升医学分割的鲁棒性	large language model
6	Multimodal Alignment with Cross-Attentive GRUs for Fine-Grained Video Understanding	提出基于跨注意力GRU的多模态对齐框架，用于细粒度视频理解	multimodal
7	MolVision: Molecular Property Prediction with Vision Language Models	MolVision：利用视觉语言模型进行分子性质预测，提升预测性能和泛化能力。	large language model multimodal	✅
8	Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor	提出全局对齐和CLIP相似度指标，用于评估视觉描述符的质量，超越传统准确率。	foundation model
9	Unlearning the Noisy Correspondence Makes CLIP More Robust	提出NCU框架，通过解耦噪声关联提升CLIP模型的鲁棒性	zero-shot transfer

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
10	FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed	FastDINOv2：基于频率的课程学习加速DINOv2预训练并提升鲁棒性	curriculum learning foundation model	✅
11	Subject Invariant Contrastive Learning for Human Activity Recognition	提出主题不变对比学习(SICL)以提升人体活动识别的泛化能力	contrastive learning multimodal
12	Helping CLIP See Both the Forest and the Trees: A Decomposition and Description Approach	提出分解与描述方法(D&D)，提升CLIP对局部语义的感知能力	contrastive learning large language model
13	Task-Specific Generative Dataset Distillation with Difficulty-Guided Sampling	提出基于难度引导采样的任务特定生成数据集蒸馏方法，提升分类任务性能。	distillation	✅
14	Source-Free Domain Adaptation via Multi-view Contrastive Learning	提出基于多视角对比学习的无源域自适应方法，提升伪标签质量。	contrastive learning
15	Dual-frequency Selected Knowledge Distillation with Statistical-based Sample Rectification for PolSAR Image Classification	提出SKDNet-SSR，通过统计校正和知识蒸馏提升双频PolSAR图像分类精度。	distillation
16	StreamDiT: Real-Time Streaming Text-to-Video Generation	StreamDiT：提出一种基于流式扩散模型的实时文本到视频生成方法，实现512p分辨率下的16 FPS。	flow matching distillation	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Open-Vocabulary Object Detection in UAV Imagery: A Review and Future Perspectives	综述：无人机影像开放词汇目标检测方法，分析挑战与展望未来	scene understanding open-vocabulary open vocabulary	✅
18	Leveraging Out-of-Distribution Unlabeled Images: Semi-Supervised Semantic Segmentation with an Open-Vocabulary Model	提出SemiOVS框架，利用开放词汇模型有效提升半监督语义分割在OOD数据上的性能。	open-vocabulary open vocabulary	✅
19	Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps	S3PO-GS：基于全局尺度一致3D高斯点图的室外单目SLAM	3D gaussian splatting 3DGS gaussian splatting	✅
20	Radar Velocity Transformer: Single-scan Moving Object Segmentation in Noisy Radar Point Clouds	提出Radar Velocity Transformer，用于在雷达点云中进行单帧移动物体分割。	scene understanding

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations	提出基于统一表征的多模态领域泛化方法，解决跨模态泛化方向不一致问题	manipulation representation learning multimodal
22	SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts	SecureT2I：防止扩散模型生成图像的未经授权编辑	manipulation

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	ConceptMix++: Leveling the Playing Field in Text-to-Image Benchmarking via Iterative Prompt Optimization	ConceptMix++通过迭代提示优化，提升文本到图像生成模型的公平基准测试。	spatial relationship multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页