| 15 |
Learning Human Motion with Temporally Conditional Mamba |
提出时序条件Mamba模型,提升时序人体运动生成任务的对齐性和真实感。 |
Mamba motion generation human motion |
✅ |
|
| 16 |
On the Use of Hierarchical Vision Foundation Models for Low-Cost Human Mesh Recovery and Pose Estimation |
利用分层视觉基础模型,实现低成本人体网格重建与姿态估计 |
Mamba human mesh recovery HMR |
✅ |
|
| 17 |
CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs |
提出CompoDistill,通过注意力蒸馏提升多模态LLM的组合推理能力。 |
distillation large language model multimodal |
|
|
| 18 |
DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search |
提出DeepMMSearch-R1以解决多模态LLM在网络搜索中的信息获取问题 |
reinforcement learning large language model multimodal |
|
|
| 19 |
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model |
SAIL-Embedding:面向真实场景的通用多模态嵌入基础模型 |
representation learning foundation model multimodal |
|
|
| 20 |
CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving |
提出CoIRL-AD,一种用于自动驾驶的竞争式模仿-强化学习框架 |
reinforcement learning imitation learning world model |
✅ |
|
| 21 |
Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation |
提出FetalMind,用于胎儿超声报告生成和诊断,提升多视图推理和疾病识别能力。 |
reinforcement learning foundation model |
✅ |
|
| 22 |
DriveVLA-W0: World Models Amplify Data Scaling Law in Autonomous Driving |
DriveVLA-W0:利用世界模型放大自动驾驶中的数据缩放定律 |
world model vision-language-action VLA |
|
|
| 23 |
CurriFlow: Curriculum-Guided Depth Fusion with Optical Flow-Based Temporal Alignment for 3D Semantic Scene Completion |
CurriFlow:基于光流时间对齐与课程学习的深度融合,用于3D语义场景补全 |
curriculum learning stereo depth optical flow |
|
|
| 24 |
DRL: Discriminative Representation Learning with Parallel Adapters for Class Incremental Learning |
提出DRL框架,通过并行适配器和解耦锚点监督,有效解决类增量学习中的表示偏移和不一致性问题。 |
DRL representation learning |
|
|
| 25 |
One Dimensional CNN ECG Mamba for Multilabel Abnormality Classification in 12 Lead ECG |
提出1D CNN ECG Mamba模型,用于12导联心电图多标签异常分类,显著提升AUPRC和AUROC。 |
Mamba state space model |
|
|
| 26 |
Dual Learning with Dynamic Knowledge Distillation and Soft Alignment for Partially Relevant Video Retrieval |
提出基于动态知识蒸馏和软对齐的双重学习框架,用于部分相关视频检索。 |
distillation |
✅ |
|
| 27 |
State Space Prompting via Gathering and Spreading Spatio-Temporal Information for Video Understanding |
提出状态空间提示(SSP)方法,通过时空信息聚合与传播提升视频理解性能。 |
state space model spatiotemporal |
|
|