| 1 |
KineST: A Kinematics-guided Spatiotemporal State Space Model for Human Motion Tracking from Sparse Signals |
KineST:一种基于运动学引导的时空状态空间模型,用于从稀疏信号中进行人体运动跟踪 |
state space model representation learning human motion |
✅ |
|
| 2 |
BrepLLM: Native Boundary Representation Understanding with Large Language Models |
BrepLLM:首个原生边界表示理解的大语言模型框架 |
contrastive learning semantic mapping semantic map |
|
|
| 3 |
SNOW: Spatio-Temporal Scene Understanding with World Knowledge for Open-World Embodied Reasoning |
SNOW:融合世界知识的时空场景理解框架,用于开放世界具身推理 |
world model scene understanding multimodal |
|
|
| 4 |
AdaTooler-V: Adaptive Tool-Use for Images and Videos |
提出AdaTooler-V,通过自适应工具使用提升多模态大语言模型在图像和视频任务中的推理效率和性能。 |
reinforcement learning large language model multimodal |
|
|
| 5 |
Instant Expressive Gaussian Head Avatar via 3D-Aware Expression Distillation |
提出基于3D感知表达蒸馏的即时高表现力高斯头部头像方法 |
distillation gaussian splatting splatting |
|
|
| 6 |
SARMAE: Masked Autoencoder for SAR Representation Learning |
提出SARMAE:一种用于SAR图像表征学习的噪声感知掩码自编码器 |
representation learning masked autoencoder |
✅ |
|
| 7 |
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation |
提出4D-RGPT,通过感知蒸馏增强MLLM在4D场景理解中的区域级推理能力。 |
distillation multimodal |
|
|
| 8 |
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text |
WorldCanvas:结合文本、轨迹和参考图像,实现可控的世界事件模拟。 |
world model multimodal visual grounding |
✅ |
|
| 9 |
Task-Oriented Data Synthesis and Control-Rectify Sampling for Remote Sensing Semantic Segmentation |
提出TODSynth框架,用于遥感语义分割任务的数据合成与控制优化。 |
flow matching foundation model multimodal |
|
|
| 10 |
Predictive Modeling of Maritime Radar Data Using Transformer Architecture |
探索Transformer在海事雷达数据预测建模中的应用,填补现有研究空白 |
predictive model spatiotemporal |
|
|
| 11 |
MACL: Multi-Label Adaptive Contrastive Learning Loss for Remote Sensing Image Retrieval |
提出MACL:一种多标签自适应对比学习损失,用于遥感图像检索 |
representation learning contrastive learning |
✅ |
|
| 12 |
Skeleton-Snippet Contrastive Learning with Multiscale Feature Fusion for Action Localization |
提出基于骨骼片段对比学习和多尺度特征融合的动作定位方法 |
contrastive learning |
|
|
| 13 |
MomaGraph: State-Aware Unified Scene Graphs with Vision-Language Model for Embodied Task Planning |
提出MomaGraph,利用视觉-语言模型为具身任务规划构建状态感知的统一场景图。 |
reinforcement learning scene understanding |
|
|