cs.CV(2025-06-07)

📊 共 21 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 Hi-LSplat: Hierarchical 3D Language Gaussian Splatting 提出Hi-LSplat,解决3D语言高斯溅射中视角不一致和层级语义理解问题。 3DGS gaussian splatting splatting
2 Multi-StyleGS: Stylizing Gaussian Splatting with Multiple Styles Multi-StyleGS:提出多风格高斯溅射方法,实现高效且可控的3D场景风格化 3D gaussian splatting gaussian splatting splatting
3 SPC to 3D: Novel View Synthesis from Binary SPC via I2I translation 提出基于I2I翻译的两阶段框架,从二值SPC图像合成高质量新视角图像 3DGS gaussian splatting splatting
4 Gaussian Mapping for Evolving Scenes 提出基于高斯映射的动态场景建模方法,解决长期演变场景的重建问题 3D gaussian splatting 3DGS gaussian splatting
5 Parametric Gaussian Human Model: Generalizable Prior for Efficient and Realistic Human Avatar Modeling 提出参数化高斯人体模型,实现高效逼真的人体Avatar建模 3D gaussian splatting 3DGS gaussian splatting
6 PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments PhysLab:用于物理实验多粒度视觉解析的基准数据集 scene understanding human-object interaction HOI
7 Dark Channel-Assisted Depth-from-Defocus from a Single Image 提出暗通道辅助的单图像散焦深度估计方法,提升场景结构重建效果 depth estimation
8 EV-LayerSegNet: Self-supervised Motion Segmentation using Event Cameras EV-LayerSegNet:一种基于事件相机的自监督运动分割网络 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
9 Sleep Stage Classification using Multimodal Embedding Fusion from EOG and PSM 利用EOG和PSM的多模态嵌入融合进行睡眠分期,提升居家睡眠监测精度。 multimodal
10 EndoARSS: Adapting Spatially-Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic Surgery EndoARSS:利用空间感知基础模型高效进行内窥镜手术活动识别与语义分割 foundation model
11 RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation RecipeGen:提出一个步骤对齐的多模态食谱生成真实世界基准。 multimodal
12 Mitigating Object Hallucination via Robust Local Perception Search 提出局部感知搜索(LPS)方法,有效缓解多模态大语言模型中的对象幻觉问题 large language model multimodal
13 Reading in the Dark with Foveated Event Vision 提出基于眼动注视的事件相机OCR方法,解决智能眼镜在弱光和高速运动下文本识别难题。 multimodal
14 How Important are Videos for Training Video LLMs? 视频LLM训练中图像数据的重要性研究:揭示视频数据利用率不足 large language model
15 Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation 提出RDVP-MSD,一种无需训练的伪装目标分割新方法,显著提升分割精度和效率。 chain-of-thought

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
16 Flood-DamageSense: Multimodal Mamba with Multitask Learning for Building Flood Damage Assessment using SAR Remote Sensing Imagery Flood-DamageSense:结合多模态Mamba与多任务学习的SAR遥感建筑洪水损毁评估框架 Mamba multimodal
17 Towards Streaming LiDAR Object Detection with Point Clouds as Egocentric Sequences 提出PFCF混合检测器,兼顾激光雷达流式目标检测的速度与精度。 Mamba SSM representation learning
18 Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning Vision-EKIPL:融合外部知识的策略学习,提升视觉推理能力 reinforcement learning policy learning large language model
19 Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation 提出基于位置预测自监督学习的多模态卫星图像语义分割方法 masked autoencoder MAE multimodal
20 Zero Shot Composed Image Retrieval 通过微调BLIP-2和分析Retrieval-DPO,提升零样本组合图像检索性能。 DPO direct preference optimization multimodal
21 THU-Warwick Submission for EPIC-KITCHEN Challenge 2025: Semi-Supervised Video Object Segmentation 结合SAM预训练与深度信息的半监督视频目标分割方法,提升第一人称视角下的分割精度。 visual pre-training egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页