cs.CV(2025-04-18)

📊 共 23 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱一:机器人控制 (Robot Control) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 EG-Gaussian: Epipolar Geometry and Graph Network Enhanced 3D Gaussian Splatting 提出EG-Gaussian,利用极几何与图网络增强3D高斯溅射重建效果 3D gaussian splatting 3DGS gaussian splatting
2 HAECcity: Open-Vocabulary Scene Understanding of City-Scale Point Clouds with Superpoint Graph Clustering 提出HAECcity,通过超点图聚类实现城市级点云的开放词汇场景理解。 scene understanding open-vocabulary open vocabulary
3 Scaling LLaNA: Advancing NeRF-Language Understanding Through Large-Scale Training LLaNA:通过大规模训练提升NeRF的语言理解能力 NeRF neural radiance field large language model
4 Visual Intention Grounding for Egocentric Assistants 提出EgoIntention数据集和Reason-to-Ground方法,解决以自我为中心视角下的意图驱动视觉定位问题 affordance egocentric multimodal
5 Enhancing Pothole Detection and Characterization: Integrated Segmentation and Depth Estimation in Road Anomaly Systems 提出结合分割与深度估计的道路异常检测系统,提升坑洼识别与表征能力。 depth estimation
6 Leveraging Automatic CAD Annotations for Supervised Learning in 3D Scene Understanding 利用自动CAD标注提升3D场景理解的监督学习性能 scene understanding
7 MicroFlow: Domain-Specific Optical Flow for Ground Deformation Estimation in Seismic Events 提出MicroFlow以解决地震事件中的地面变形估计问题 optical flow
8 Occlusion-Ordered Semantic Instance Segmentation 提出基于遮挡顺序的语义实例分割方法,提升3D场景理解能力 depth estimation monocular depth

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
9 CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning CheXWorld:构建放射影像世界模型,提升表征学习能力 world model representation learning foundation model
10 LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models LoftUp:学习基于坐标的特征上采样器,提升视觉基础模型像素级理解能力 distillation foundation model
11 Compile Scene Graphs with Reinforcement Learning 提出R1-SGG,利用强化学习编译场景图,显著提升多模态大语言模型在场景图生成任务上的性能。 reinforcement learning large language model multimodal
12 CytoFM: The first cytology foundation model 提出CytoFM,首个细胞学自监督预训练模型,提升细胞学图像分析性能。 distillation foundation model
13 U-Shape Mamba: State Space Model for faster diffusion 提出U型Mamba(USM),加速扩散模型并提升图像生成质量。 Mamba state space model
14 WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion WeatherGen:提出基于 Spider Mamba Diffusion 的统一多样天气 LiDAR 点云生成框架 Mamba contrastive learning
15 VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment VideoPASTA:通过7K偏好对齐提升视频-语言模型时空理解能力 direct preference optimization spatial relationship

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
16 Chain-of-Evidence Multimodal Reasoning for Few-shot Temporal Action Localization 提出链式证据多模态推理方法,用于小样本时序动作定位。 large language model multimodal
17 Fashion-RAG: Multimodal Fashion Image Editing via Retrieval-Augmented Generation 提出Fashion-RAG,通过检索增强生成实现多模态时尚图像编辑。 multimodal
18 SatelliteCalculator: A Multi-Task Vision Foundation Model for Quantitative Remote Sensing Inversion 提出SatelliteCalculator,用于遥感定量反演的多任务视觉基础模型 foundation model
19 Towards a Multi-Agent Vision-Language System for Zero-Shot Novel Hazardous Object Detection for Autonomous Driving Safety 提出一种多智能体视觉-语言系统,用于自动驾驶中零样本新颖危险物体检测。 large language model multimodal
20 Zero-Shot Industrial Anomaly Segmentation with Image-Aware Prompt Generation 提出IAP-AS,通过图像感知提示生成实现工业异常分割的零样本学习。 large language model
21 Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction Mono3R:利用单目线索增强几何三维重建,提升弱纹理和低光照场景性能。 foundation model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
22 DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images DanceText:一种免训练的分层框架,用于图像中可控的多语言文本转换。 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
23 Analysing the Robustness of Vision-Language-Models to Common Corruptions 分析视觉-语言模型在常见图像损坏下的鲁棒性,揭示Transformer的频率偏置。 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页