| 14 |
EgoCross: Benchmarking Multimodal Large Language Models for Cross-Domain Egocentric Video Question Answering |
提出EgoCross以解决跨领域自我中心视频问答问题 |
reinforcement learning egocentric large language model |
✅ |
|
| 15 |
MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data |
提出MAESTRO,利用掩码自编码器处理多模态、多时相、多光谱地球观测数据。 |
masked autoencoder multimodal |
✅ |
|
| 16 |
HumanSense: From Multimodal Perception to Empathetic Context-Aware Responses through Reasoning MLLMs |
提出HumanSense基准,评估多模态LLM在以人为中心的场景中的感知和交互能力。 |
reinforcement learning large language model multimodal |
✅ |
|
| 17 |
EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba |
提出基于Skeleton Mamba的EgoMusic运动网络,用于从第一视角视频和音乐驱动的人体舞蹈动作估计。 |
Mamba egocentric human motion |
|
|
| 18 |
Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation |
PhysHPO:用于物理合理视频生成的分层细粒度偏好优化 |
direct preference optimization physically plausible |
|
|
| 19 |
Trajectory-aware Shifted State Space Models for Online Video Super-Resolution |
提出基于轨迹感知的移位状态空间模型的在线视频超分辨率方法,提升时空信息聚合效率。 |
Mamba SSM state space model |
|
|
| 20 |
BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation |
提出BLADE框架,通过块稀疏注意力与步进蒸馏加速高效视频生成。 |
distillation spatiotemporal |
|
|
| 21 |
From Diagnosis to Improvement: Probing Spatio-Physical Reasoning in Vision Language Models |
诊断并改进视觉语言模型中的时空物理推理能力 |
reinforcement learning world model multimodal |
|
|
| 22 |
VIFSS: View-Invariant and Figure Skating-Specific Pose Representation Learning for Temporal Action Segmentation |
提出VIFSS框架,解决花样滑冰跳跃动作时序分割中视角不变性和数据稀缺问题 |
representation learning contrastive learning |
|
|
| 23 |
Beyond conventional vision: RGB-event fusion for robust object detection in dynamic traffic scenarios |
提出MCFNet,融合RGB图像与事件相机数据,提升动态交通场景下目标检测的鲁棒性。 |
Mamba optical flow spatiotemporal |
✅ |
|
| 24 |
Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances |
综述:强化学习赋能视觉生成模型,提升可控性与真实感 |
reinforcement learning |
|
|
| 25 |
Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances |
将强化学习与视觉生成模型相结合以优化生成质量 |
reinforcement learning |
|
|