| 1 |
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization |
提出AnchorWorld以解决交互式世界建模的可控性问题 |
world model world models egocentric |
|
|
| 2 |
EgoPressDiff: Multimodal Video Diffusion for Egocentric UV-Domain Hand-Pressure Estimation |
提出EgoPressDiff以解决手部接触压力估计问题 |
MAE egocentric multimodal |
✅ |
|
| 3 |
Multi-FRuGaL: Multimodal Flexible Redundancy-aware Decomposed Gated Learning for Cancer Diagnosis and Prognosis |
提出Multi-FRuGaL框架以解决癌症诊断中的多模态数据缺失问题 |
representation learning multimodal |
|
|
| 4 |
MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism |
提出MemDreamer以解决长视频理解中的感知与推理问题 |
dreamer spatiotemporal multimodal |
|
|
| 5 |
STREAM: Stochastic Riemannian Flow Matching with Anisotropic Decoder for Digital Histopathology Image Generation |
提出STREAM框架以解决数字病理图像生成中的条件崩溃问题 |
flow matching foundation model |
|
|
| 6 |
VideoSEG-O3: A Multi-turn Reinforcement Learning Framework for Reasoning Video Object Segmentation |
提出VideoSEG-O3框架以解决视频目标分割中的推理问题 |
reinforcement learning chain-of-thought |
✅ |
|
| 7 |
Lighting-Aware Representation Learning under Controllable Lighting Variation |
提出照明感知表示学习框架以解决光照变化问题 |
representation learning contrastive learning |
|
|
| 8 |
Native3D: End-to-End 3D Scene Generation via Unified Mesh-Texture Modeling and Semantic Alignment |
提出Native3D以解决传统3D场景生成中的2D适配问题 |
contrastive learning spatial relationship |
|
|