cs.CV（2025-07-18）

📊 共 26 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (9 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (8) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱八：物理动画 (Physics-based Animation) (2) 支柱一：机器人控制 (Robot Control) (2) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	PCR-GS: COLMAP-Free 3D Gaussian Splatting via Pose Co-Regularizations	PCR-GS：通过位姿协同正则化实现无COLMAP的3D高斯溅射	3D gaussian splatting 3DGS gaussian splatting
2	Depth3DLane: Fusing Monocular 3D Lane Detection with Self-Supervised Monocular Depth Estimation	Depth3DLane：融合自监督单目深度估计的单目3D车道线检测	depth estimation monocular depth
3	Enhancing LiDAR Point Features with Foundation Model Priors for 3D Object Detection	利用视觉基础模型先验增强LiDAR点云特征，提升3D目标检测精度	Depth Anything foundation model
4	TimeNeRF: Building Generalizable Neural Radiance Fields across Time from Few-Shot Input Views	TimeNeRF：基于少量输入视图构建可泛化的跨时间神经辐射场	NeRF neural radiance field
5	Semantic Segmentation based Scene Understanding in Autonomous Vehicles	针对自动驾驶车辆，提出基于语义分割的场景理解模型，并分析骨干网络的影响。	scene understanding
6	EPSilon: Efficient Point Sampling for Lightening of Hybrid-based 3D Avatar Generation	提出EPSilon高效点采样方法，加速混合3D头像生成模型的训练与推理。	NeRF neural radiance field SMPL	✅
7	PositionIC: Unified Position and Identity Consistency for Image Customization	PositionIC：统一位置和身份一致性的图像定制框架	NeRF spatial relationship
8	Augmented Reality in Cultural Heritage: A Dual-Model Pipeline for 3D Artwork Reconstruction	提出双模型融合的增强现实管线，用于文化遗产领域3D艺术品重建	depth estimation Depth Anything
9	Moving Object Detection from Moving Camera Using Focus of Expansion Likelihood and Segmentation	提出FoELS方法，融合光流和纹理信息，解决移动相机下的运动目标检测问题。	scene understanding optical flow

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
10	VLA-Mark: A cross modal watermark for large vision-language alignment model	提出VLA-Mark，通过跨模态对齐的水印嵌入方法，保护视觉-语言模型的知识产权。	VLA multimodal visual grounding
11	Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark	提出Document Haystack基准，评估VLM在长文档多模态理解中的检索能力。	large language model multimodal
12	Foundation Models as Class-Incremental Learners for Dermatological Image Classification	利用皮肤病灶预训练的Foundation Model进行皮肤病图像的增量学习	foundation model
13	Leveraging Pathology Foundation Models for Panoptic Segmentation of Melanoma in H&E Images	利用病理学基础模型进行H&E图像中黑色素瘤的全景分割	foundation model
14	SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing	SkySense V2：统一多模态遥感基础模型，提升参数效率与遥感数据适应性	foundation model
15	Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution	提出幻觉评分(Hallucination Score)以缓解生成式超分辨率中的幻觉问题。	large language model multimodal
16	Team of One: Cracking Complex Video QA with Model Synergy	提出基于模型协同的框架，解决复杂视频问答中推理深度和鲁棒性问题	large language model multimodal
17	When Seeing Overrides Knowing: Disentangling Knowledge Conflicts in Vision-Language Models	提出多模态对抗查询数据集，解析视觉语言模型中知识冲突的解决机制。	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
18	MaskHOI: Robust 3D Hand-Object Interaction Estimation via Masked Pre-training	MaskHOI：通过掩码预训练实现鲁棒的3D手-物交互估计	representation learning masked autoencoder MAE
19	Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning	Franca：嵌套Matryoshka聚类，实现可扩展的视觉表征学习	representation learning foundation model	✅
20	GRAM-MAMBA: Holistic Feature Alignment for Wireless Perception with Adaptive Low-Rank Compensation	提出GRAM-MAMBA，通过自适应低秩补偿实现无线感知中高效鲁棒的多模态融合。	Mamba multimodal
21	Training-free Token Reduction for Vision Mamba	提出MTR：一种免训练的Vision Mamba Token精简框架，提升计算效率。	Mamba

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
22	CoTasks: Chain-of-Thought based Video Instruction Tuning Tasks	提出CoTasks框架，增强VideoLLM在细粒度视频理解上的思维链推理能力	spatiotemporal large language model chain-of-thought
23	Localized FNO for Spatiotemporal Hemodynamic Upsampling in Aneurysm MRI	提出局部傅里叶神经算子(LoFNO)用于动脉瘤MRI血流动力学时空超分辨率重建	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
24	Moodifier: MLLM-Enhanced Emotion-Driven Image Editing	Moodifier：利用MLLM增强的情感驱动图像编辑，实现精准情感操控和内容完整性。	manipulation large language model multimodal
25	Tackling fake images in cybersecurity -- Interpretation of a StyleGAN and lifting its black-box	分析StyleGAN内部机制，揭示AI生成图像潜在的网络安全风险	manipulation

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
26	C-DOG: Multi-View Multi-instance Feature Association Using Connected δ-Overlap Graphs	提出C-DOG算法，利用几何约束解决多视角多实例特征关联问题，适用于高密度场景。	feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页