cs.CV(2025-07-01)

📊 共 10 篇论文

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱一:机器人控制 (Robot Control) (3) 支柱二:RL算法与架构 (RL & Architecture) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
1 GaussianVLM: Scene-centric 3D Vision-Language Models using Language-aligned Gaussian Splats for Embodied Reasoning and Beyond 提出GaussianVLM,利用语言对齐的高斯溅射实现场景中心的三维视觉语言模型,用于具身推理等任务。 gaussian splatting splatting scene understanding
2 GDGS: 3D Gaussian Splatting Via Geometry-Guided Initialization And Dynamic Density Control 提出几何引导初始化与动态密度控制以解决3D高斯点云渲染问题 3D gaussian splatting 3DGS gaussian splatting
3 Populate-A-Scene: Affordance-Aware Human Video Generation 提出基于场景图像的人类视频生成模型以解决交互模拟问题 affordance affordance-aware
4 PlantSegNeRF: A few-shot, cross-species method for plant 3D instance point cloud reconstruction via joint-channel NeRF with multi-view image instance matching 提出PlantSegNeRF以解决植物点云实例分割精度不足问题 NeRF neural radiance field

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
5 Towards Open-World Human Action Segmentation Using Graph Convolutional Networks 提出基于图卷积网络的开放世界人体行为分割框架,解决未知行为的检测与分割问题 bi-manual human-object interaction spatiotemporal
6 Geometry-aware 4D Video Generation for Robot Manipulation 提出几何感知4D视频生成模型,提升机器人操作中多视角时空一致性 manipulation
7 Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation 提出基于正弦编码的多模态图卷积网络,提升人机协作中动作分割的鲁棒性。 bi-manual human-object interaction

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
8 Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers 提出门控递归融合(GRF),以线性复杂度实现可扩展的多模态Transformer。 representation learning multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
9 CGEarthEye:A High-Resolution Remote Sensing Vision Foundation Model Based on the Jilin-1 Satellite Constellation 提出CGEarthEye以解决高分辨率遥感图像解读问题 spatiotemporal foundation model

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
10 Learning Dense Feature Matching via Lifting Single 2D Image to 3D Space 提出L2M框架以解决单视图图像特征匹配问题 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页