cs.CV(2025-06-07)
📊 共 21 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (6 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Hi-LSplat: Hierarchical 3D Language Gaussian Splatting | 提出Hi-LSplat,解决3D语言高斯溅射中视角不一致和层级语义理解问题。 | 3DGS gaussian splatting splatting | ||
| 2 | Multi-StyleGS: Stylizing Gaussian Splatting with Multiple Styles | Multi-StyleGS:提出多风格高斯溅射方法,实现高效且可控的3D场景风格化 | 3D gaussian splatting gaussian splatting splatting | ||
| 3 | SPC to 3D: Novel View Synthesis from Binary SPC via I2I translation | 提出基于I2I翻译的两阶段框架,从二值SPC图像合成高质量新视角图像 | 3DGS gaussian splatting splatting | ||
| 4 | Gaussian Mapping for Evolving Scenes | 提出基于高斯映射的动态场景建模方法,解决长期演变场景的重建问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 5 | Parametric Gaussian Human Model: Generalizable Prior for Efficient and Realistic Human Avatar Modeling | 提出参数化高斯人体模型,实现高效逼真的人体Avatar建模 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 6 | PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments | PhysLab:用于物理实验多粒度视觉解析的基准数据集 | scene understanding human-object interaction HOI | ✅ | |
| 7 | Dark Channel-Assisted Depth-from-Defocus from a Single Image | 提出暗通道辅助的单图像散焦深度估计方法,提升场景结构重建效果 | depth estimation | ||
| 8 | EV-LayerSegNet: Self-supervised Motion Segmentation using Event Cameras | EV-LayerSegNet:一种基于事件相机的自监督运动分割网络 | optical flow |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Sleep Stage Classification using Multimodal Embedding Fusion from EOG and PSM | 利用EOG和PSM的多模态嵌入融合进行睡眠分期,提升居家睡眠监测精度。 | multimodal | ||
| 10 | EndoARSS: Adapting Spatially-Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic Surgery | EndoARSS:利用空间感知基础模型高效进行内窥镜手术活动识别与语义分割 | foundation model | ||
| 11 | RecipeGen: A Step-Aligned Multimodal Benchmark for Real-World Recipe Generation | RecipeGen:提出一个步骤对齐的多模态食谱生成真实世界基准。 | multimodal | ||
| 12 | Mitigating Object Hallucination via Robust Local Perception Search | 提出局部感知搜索(LPS)方法,有效缓解多模态大语言模型中的对象幻觉问题 | large language model multimodal | ||
| 13 | Reading in the Dark with Foveated Event Vision | 提出基于眼动注视的事件相机OCR方法,解决智能眼镜在弱光和高速运动下文本识别难题。 | multimodal | ||
| 14 | How Important are Videos for Training Video LLMs? | 视频LLM训练中图像数据的重要性研究:揭示视频数据利用率不足 | large language model | ||
| 15 | Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation | 提出RDVP-MSD,一种无需训练的伪装目标分割新方法,显著提升分割精度和效率。 | chain-of-thought | ✅ |