cs.CV(2025-07-01)
📊 共 10 篇论文
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱一:机器人控制 (Robot Control) (3)
支柱二:RL算法与架构 (RL & Architecture) (1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond | 提出GaussianVLM,利用语言对齐的高斯溅射实现场景中心的三维视觉语言模型,用于具身推理等任务。 | gaussian splatting splatting scene understanding | ||
| 2 | GDGS: 3D Gaussian Splatting Via Geometry-Guided Initialization And Dynamic Density Control | 提出几何引导初始化与动态密度控制以解决3D高斯点云渲染问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 3 | Populate-A-Scene: Affordance-Aware Human Video Generation | 提出基于场景图像的人类视频生成模型以解决交互模拟问题 | affordance affordance-aware | ||
| 4 | PlantSegNeRF: A few-shot, cross-species method for plant 3D instance point cloud reconstruction via joint-channel NeRF with multi-view image instance matching | 提出PlantSegNeRF以解决植物点云实例分割精度不足问题 | NeRF neural radiance field |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Towards Open-World Human Action Segmentation Using Graph Convolutional Networks | 提出基于图卷积网络的开放世界人体行为分割框架,解决未知行为的检测与分割问题 | bi-manual human-object interaction spatiotemporal | ||
| 6 | Geometry-aware 4D Video Generation for Robot Manipulation | 提出几何感知4D视频生成模型,提升机器人操作中多视角时空一致性 | manipulation | ||
| 7 | Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation | 提出基于正弦编码的多模态图卷积网络,提升人机协作中动作分割的鲁棒性。 | bi-manual human-object interaction |
🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers | 提出门控递归融合(GRF),以线性复杂度实现可扩展的多模态Transformer。 | representation learning multimodal |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation | 提出CGEarthEye以解决高分辨率遥感图像解读问题 | spatiotemporal foundation model |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space | 提出L2M框架以解决单视图图像特征匹配问题 | feature matching |