| 1 |
KinTwin: Imitation Learning with Torque and Muscle Driven Biomechanical Models Enables Precise Replication of Able-Bodied and Impaired Movement from Markerless Motion Capture |
KinTwin:利用力矩和肌肉驱动的生物力学模型,通过模仿学习精确复制无标记运动捕捉中的正常和受损运动 |
imitation learning markerless motion capture |
|
|
| 2 |
Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning |
提出基于难度先验的强化学习方法,提升多模态推理能力 |
reinforcement learning multimodal |
|
|
| 3 |
Mamba-Adaptor: State Space Model Adaptor for Visual Recognition |
提出Mamba-Adaptor,解决Mamba在视觉识别中全局上下文建模、长程依赖和空间结构建模的不足。 |
Mamba SSM state space model |
|
|
| 4 |
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning |
G1:通过强化学习引导视觉-语言模型感知与推理能力,提升交互式游戏环境决策能力。 |
reinforcement learning multimodal |
✅ |
|
| 5 |
SPKLIP: Aligning Spike Video Streams with Natural Language |
SPKLIP:提出用于Spike视频-语言对齐的新架构,解决模态差异导致的性能瓶颈。 |
contrastive learning VLA multimodal |
|
|
| 6 |
AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use |
AutoMat:通过智能体工具调用实现显微图像自动晶体结构重建 |
MAE large language model multimodal |
✅ |
|
| 7 |
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation |
BusterX:提出基于MLLM的AI生成视频伪造检测与解释框架,并构建大规模数据集GenBuster-200K。 |
reinforcement learning large language model multimodal |
|
|
| 8 |
Few-Step Diffusion via Score identity Distillation |
提出基于Score identity Distillation的SiD框架,加速Stable Diffusion XL等文图生成模型。 |
distillation classifier-free guidance |
✅ |
|
| 9 |
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping |
Sat2Sound:用于零样本声景地图构建的统一多模态框架 |
representation learning contrastive learning multimodal |
|
|
| 10 |
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking |
Safe-Sora:通过图形式水印实现安全的文本到视频生成 |
Mamba state space model spatiotemporal |
✅ |
|
| 11 |
DD-Ranking: Rethinking the Evaluation of Dataset Distillation |
DD-Ranking:重新思考数据集蒸馏的评估方法,提出更公平的评估框架。 |
distillation |
|
|
| 12 |
RMMSS: Towards Advanced Robust Multi-Modal Semantic Segmentation with Hybrid Prototype Distillation and Feature Selection |
RMMSS:面向鲁棒多模态语义分割,提出混合原型蒸馏与特征选择框架 |
distillation |
|
|
| 13 |
Coarse Attribute Prediction with Task Agnostic Distillation for Real World Clothes Changing ReID |
提出RLQ框架,通过粗粒度属性预测和任务无关蒸馏提升真实场景下服饰变换ReID的鲁棒性。 |
distillation |
|
|
| 14 |
RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers |
RoPECraft:基于轨迹引导RoPE优化的无训练扩散Transformer视频动作迁移 |
flow matching optical flow |
|
|
| 15 |
Touch2Shape: Touch-Conditioned 3D Diffusion for Shape Exploration and Reconstruction |
Touch2Shape:提出触觉条件下的3D扩散模型,用于形状探索与重建 |
reinforcement learning reward design |
|
|
| 16 |
Towards Low-Latency Event Stream-based Visual Object Tracking: A Slow-Fast Approach |
提出SFTrack:一种低延迟事件流视觉目标跟踪的慢-快方法 |
representation learning distillation |
✅ |
|