cs.CV（2025-04-18）

📊 共 23 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (8 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (7 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱一：机器人控制 (Robot Control) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	EG-Gaussian: Epipolar Geometry and Graph Network Enhanced 3D Gaussian Splatting	提出EG-Gaussian，利用极几何与图网络增强3D高斯溅射重建效果	3D gaussian splatting 3DGS gaussian splatting
2	HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering	提出HAECcity，通过超点图聚类实现城市级点云的开放词汇场景理解。	scene understanding open-vocabulary open vocabulary
3	Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training	LLaNA：通过大规模训练提升NeRF的语言理解能力	NeRF neural radiance field large language model
4	Visual Intention Grounding for Egocentric Assistants	提出EgoIntention数据集和Reason-to-Ground方法，解决以自我为中心视角下的意图驱动视觉定位问题	affordance egocentric multimodal
5	Enhancing Pothole Detection and Characterization: Integrated Segmentation and Depth Estimation in Road Anomaly Systems	提出结合分割与深度估计的道路异常检测系统，提升坑洼识别与表征能力。	depth estimation
6	Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding	利用自动CAD标注提升3D场景理解的监督学习性能	scene understanding
7	MicroFlow: Domain-Specific Optical Flow for Ground Deformation Estimation in Seismic Events	提出MicroFlow以解决地震事件中的地面变形估计问题	optical flow	✅
8	Occlusion-Ordered Semantic Instance Segmentation	提出基于遮挡顺序的语义实例分割方法，提升3D场景理解能力	depth estimation monocular depth

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
9	CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning	CheXWorld：构建放射影像世界模型，提升表征学习能力	world model representation learning foundation model	✅
10	LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models	LoftUp：学习基于坐标的特征上采样器，提升视觉基础模型像素级理解能力	distillation foundation model	✅
11	Compile Scene Graphs with Reinforcement Learning	提出R1-SGG，利用强化学习编译场景图，显著提升多模态大语言模型在场景图生成任务上的性能。	reinforcement learning large language model multimodal	✅
12	CytoFM: The first cytology foundation model	提出CytoFM，首个细胞学自监督预训练模型，提升细胞学图像分析性能。	distillation foundation model
13	U-Shape Mamba: State Space Model for faster diffusion	提出U型Mamba（USM），加速扩散模型并提升图像生成质量。	Mamba state space model
14	WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion	WeatherGen：提出基于 Spider Mamba Diffusion 的统一多样天气 LiDAR 点云生成框架	Mamba contrastive learning	✅
15	VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment	VideoPASTA：通过7K偏好对齐提升视频-语言模型时空理解能力	direct preference optimization spatial relationship

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Chain-of-Evidence Multimodal Reasoning for Few-shot Temporal Action Localization	提出链式证据多模态推理方法，用于小样本时序动作定位。	large language model multimodal	✅
17	Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation	提出Fashion-RAG，通过检索增强生成实现多模态时尚图像编辑。	multimodal
18	SatelliteCalculator: A Multi-Task Vision Foundation Model for Quantitative Remote Sensing Inversion	提出SatelliteCalculator，用于遥感定量反演的多任务视觉基础模型	foundation model
19	Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety	提出一种多智能体视觉-语言系统，用于自动驾驶中零样本新颖危险物体检测。	large language model multimodal	✅
20	Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation	提出IAP-AS，通过图像感知提示生成实现工业异常分割的零样本学习。	large language model
21	Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction	Mono3R：利用单目线索增强几何三维重建，提升弱纹理和低光照场景性能。	foundation model

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images	DanceText：一种免训练的分层框架，用于图像中可控的多语言文本转换。	manipulation	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Analysing the Robustness of Vision-Language-Models to Common Corruptions	分析视觉-语言模型在常见图像损坏下的鲁棒性，揭示Transformer的频率偏置。	PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页