| 1 |
Pixel-Wise Multimodal Contrastive Learning for Remote Sensing Images |
提出像素级多模态对比学习PIMC,有效提升遥感图像时间序列分析性能 |
contrastive learning multimodal |
|
|
| 2 |
Semantic Belief-State World Model for 3D Human Motion Prediction |
提出语义信念状态世界模型(SBWM)用于解决3D人体运动预测中的长时漂移问题。 |
reinforcement learning world model latent dynamics |
|
|
| 3 |
Staged Voxel-Level Deep Reinforcement Learning for 3D Medical Image Segmentation with Noisy Annotations |
提出SVL-DRL框架,解决医学图像分割中带噪声标注的问题。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 4 |
MVP: Enhancing Video Large Language Models via Self-supervised Masked Video Prediction |
提出MVP:通过自监督掩码视频预测增强视频大语言模型 |
reinforcement learning large language model |
|
|
| 5 |
From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs |
提出NNGPT,利用LLM和性能反馈自动设计最优数据增强策略。 |
reinforcement learning large language model chain-of-thought |
|
|
| 6 |
REFA: Real-time Egocentric Facial Animations for Virtual Reality |
提出基于VR头显内红外相机的实时面部动画系统,无需校准。 |
distillation egocentric |
|
|
| 7 |
Thinking with Frames: Generative Video Distortion Evaluation via Frame Reward Model |
提出REACT:基于帧奖励模型的生成视频结构扭曲评估框架 |
reinforcement learning chain-of-thought |
|
|
| 8 |
CrackSegFlow: Controllable Flow-Matching Synthesis for Generalizable Crack Segmentation with the CSF-50K Benchmark |
提出CrackSegFlow,结合CSF-50K基准,提升裂缝分割的泛化性和可控性 |
flow matching |
|
|
| 9 |
ToTMNet: FFT-Accelerated Toeplitz Temporal Mixing Network for Lightweight Remote Photoplethysmography |
提出ToTMNet,利用FFT加速的Toeplitz时序混合网络实现轻量级远程光电容积脉搏波估计。 |
MAE PULSE |
|
|
| 10 |
Diffusion-DRF: Differentiable Reward Flow for Video Diffusion Fine-Tuning |
提出Diffusion-DRF以解决视频扩散模型微调中的奖励信号问题 |
DPO direct preference optimization |
|
|
| 11 |
Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models |
提出LocalDPO,通过局部细节偏好优化提升视频扩散模型生成质量 |
preference learning DPO |
|
|