| 1 |
SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation |
提出SE(3)-PoseFlow,用于估计6D位姿分布,实现不确定性感知的机器人操作 |
manipulation grasp flow matching |
|
|
| 2 |
OmniVLA: Physically-Grounded Multimodal VLA with Unified Multi-Sensor Perception for Robotic Manipulation |
OmniVLA:面向机器人操作的物理 grounding 多模态 VLA 模型,统一多传感器感知 |
manipulation |
|
|
| 3 |
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning |
提出TIR-Bench,用于评估Agentic图像推理中模型利用工具进行图像处理的能力 |
manipulation localization |
|
|
| 4 |
PixelVLA: Advancing Pixel-level Understanding in Vision-Language-Action Model |
PixelVLA:通过像素级理解和多模态提示,提升视觉-语言-动作模型的性能 |
manipulation scene understanding |
|
|
| 5 |
Web-Scale Collection of Video Data for 4D Animal Reconstruction |
提出AiM数据集与基线方法,用于野生环境下的动物4D重建 |
quadruped pose estimation |
✅ |
|
| 6 |
EVLP:Learning Unified Embodied Vision-Language Planner with Reinforced Supervised Fine-Tuning |
提出EVLP,通过强化监督微调学习统一具身视觉-语言规划器,解决长程操作任务中的多模态规划问题。 |
manipulation |
|
|
| 7 |
A Generative Adversarial Approach to Adversarial Attacks Guided by Contrastive Language-Image Pre-trained Model |
提出基于对比语言-图像预训练模型的生成对抗攻击方法,提升攻击效果与视觉保真度。 |
manipulation |
|
|
| 8 |
Source-Only Cross-Weather LiDAR via Geometry-Aware Point Drop |
提出几何感知点丢弃适配器,提升LiDAR在恶劣天气下的语义分割性能。 |
domain randomization |
|
|