cs.CV(2025-07-18)

📊 共 26 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (8) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱八:物理动画 (Physics-based Animation) (2) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations PCR-GS:通过位姿协同正则化实现无COLMAP的3D高斯溅射 3D gaussian splatting 3DGS gaussian splatting
2 Depth3DLane: Fusing Monocular 3D Lane Detection with Self-Supervised Monocular Depth Estimation Depth3DLane:融合自监督单目深度估计的单目3D车道线检测 depth estimation monocular depth
3 Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection 利用视觉基础模型先验增强LiDAR点云特征,提升3D目标检测精度 Depth Anything foundation model
4 TimeNeRF: Building Generalizable Neural Radiance Fields across Time from Few-Shot Input Views TimeNeRF:基于少量输入视图构建可泛化的跨时间神经辐射场 NeRF neural radiance field
5 Semantic Segmentation based Scene Understanding in Autonomous Vehicles 针对自动驾驶车辆,提出基于语义分割的场景理解模型,并分析骨干网络的影响。 scene understanding
6 EPSilon: Efficient Point Sampling for Lightening of Hybrid-based 3D Avatar Generation 提出EPSilon高效点采样方法,加速混合3D头像生成模型的训练与推理。 NeRF neural radiance field SMPL
7 PositionIC: Unified Position and Identity Consistency for Image Customization PositionIC:统一位置和身份一致性的图像定制框架 NeRF spatial relationship
8 Augmented Reality in Cultural Heritage: A Dual-Model Pipeline for 3D Artwork Reconstruction 提出双模型融合的增强现实管线,用于文化遗产领域3D艺术品重建 depth estimation Depth Anything
9 Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation 提出FoELS方法,融合光流和纹理信息,解决移动相机下的运动目标检测问题。 scene understanding optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
10 VLA-Mark: A cross modal watermark for large vision-language alignment model 提出VLA-Mark,通过跨模态对齐的水印嵌入方法,保护视觉-语言模型的知识产权。 VLA multimodal visual grounding
11 Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark 提出Document Haystack基准,评估VLM在长文档多模态理解中的检索能力。 large language model multimodal
12 Foundation Models as Class-Incremental Learners for Dermatological Image Classification 利用皮肤病灶预训练的Foundation Model进行皮肤病图像的增量学习 foundation model
13 Leveraging Pathology Foundation Models for Panoptic Segmentation of Melanoma in H&E Images 利用病理学基础模型进行H&E图像中黑色素瘤的全景分割 foundation model
14 SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing SkySense V2:统一多模态遥感基础模型,提升参数效率与遥感数据适应性 foundation model
15 Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution 提出幻觉评分(Hallucination Score)以缓解生成式超分辨率中的幻觉问题。 large language model multimodal
16 Team of One: Cracking Complex Video QA with Model Synergy 提出基于模型协同的框架,解决复杂视频问答中推理深度和鲁棒性问题 large language model multimodal
17 When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models 提出多模态对抗查询数据集,解析视觉语言模型中知识冲突的解决机制。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
18 MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training MaskHOI:通过掩码预训练实现鲁棒的3D手-物交互估计 representation learning masked autoencoder MAE
19 Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning Franca:嵌套Matryoshka聚类,实现可扩展的视觉表征学习 representation learning foundation model
20 GRAM-MAMBA: Holistic Feature Alignment for Wireless Perception with Adaptive Low-Rank Compensation 提出GRAM-MAMBA,通过自适应低秩补偿实现无线感知中高效鲁棒的多模态融合。 Mamba multimodal
21 Training-free Token Reduction for Vision Mamba 提出MTR:一种免训练的Vision Mamba Token精简框架,提升计算效率。 Mamba

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
22 CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks 提出CoTasks框架,增强VideoLLM在细粒度视频理解上的思维链推理能力 spatiotemporal large language model chain-of-thought
23 Localized FNO for Spatiotemporal Hemodynamic Upsampling in Aneurysm MRI 提出局部傅里叶神经算子(LoFNO)用于动脉瘤MRI血流动力学时空超分辨率重建 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
24 Moodifier: MLLM-Enhanced Emotion-Driven Image Editing Moodifier:利用MLLM增强的情感驱动图像编辑,实现精准情感操控和内容完整性。 manipulation large language model multimodal
25 Tackling fake images in cybersecurity -- Interpretation of a StyleGAN and lifting its black-box 分析StyleGAN内部机制,揭示AI生成图像潜在的网络安全风险 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
26 C-DOG: Multi-View Multi-instance Feature Association Using Connected δ-Overlap Graphs 提出C-DOG算法,利用几何约束解决多视角多实例特征关联问题,适用于高密度场景。 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页