| 1 |
GST-VLA: Structured Gaussian Spatial Tokens for 3D Depth-Aware Vision-Language-Action Models |
提出GST-VLA以解决3D深度感知视觉-语言-动作模型的几何结构问题 |
flow matching affordance vision-language-action |
|
|
| 2 |
GSStream: 3D Gaussian Splatting based Volumetric Scene Streaming System |
提出GSStream,基于3D高斯溅射的体渲染场景流式传输系统,优化带宽占用。 |
reinforcement learning deep reinforcement learning DRL |
|
|
| 3 |
OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models |
OddGridBench揭示多模态大模型在细粒度视觉差异感知上的不足 |
reinforcement learning curriculum learning reward design |
✅ |
|
| 4 |
EventVGGT: Exploring Cross-Modal Distillation for Consistent Event-based Depth Estimation |
EventVGGT:探索跨模态蒸馏,实现事件相机一致性深度估计 |
distillation depth estimation monocular depth |
|
|
| 5 |
Multimodal Graph Representation Learning with Dynamic Information Pathways |
提出基于动态信息路径的多模态图表示学习框架,提升异构图数据的学习能力。 |
representation learning multimodal |
|
|
| 6 |
Progressive Representation Learning for Multimodal Sentiment Analysis with Incomplete Modalities |
提出PRLF框架,解决多模态情感分析中模态缺失带来的特征错位问题。 |
representation learning multimodal |
|
|
| 7 |
AutoViVQA: A Large-Scale Automatically Constructed Dataset for Vietnamese Visual Question Answering |
提出AutoViVQA:一个大规模自动构建的越南语视觉问答数据集。 |
representation learning visual pre-training large language model |
|
|
| 8 |
Towards Unified Multimodal Interleaved Generation via Group Relative Policy Optimization |
提出基于强化学习的后训练策略,实现统一多模态模型中的交错生成能力。 |
reinforcement learning multimodal |
|
|
| 9 |
Progressive Split Mamba: Effective State Space Modelling for Image Restoration |
提出Progressive Split Mamba,有效解决图像复原中长程依赖建模问题。 |
Mamba SSM state space model |
|
|
| 10 |
From Semantics to Pixels: Coarse-to-Fine Masked Autoencoders for Hierarchical Visual Understanding |
C2FMAE:提出粗到精掩码自编码器,用于分层视觉理解 |
masked autoencoder contrastive learning visual pre-training |
|
|
| 11 |
Streaming Autoregressive Video Generation via Diagonal Distillation |
提出对角蒸馏方法,加速自回归视频生成,实现实时流式传输。 |
distillation optical flow |
|
|
| 12 |
Decoder-Free Distillation for Quantized Image Restoration |
提出QDR框架,通过无解码器蒸馏和可学习权重,实现量化图像恢复的性能提升。 |
teacher-student distillation |
|
|
| 13 |
Exploring Modality-Aware Fusion and Decoupled Temporal Propagation for Multi-Modal Object Tracking |
MDTrack:针对多模态目标跟踪,提出模态感知融合与解耦时序传播方法 |
SSM state space model multimodal |
|
|
| 14 |
UniField: A Unified Field-Aware MRI Enhancement Framework |
提出UniField统一框架,利用多场强MRI数据提升增强效果和泛化性。 |
flow matching representation learning foundation model |
|
|
| 15 |
RubiCap: Rubric-Guided Reinforcement Learning for Dense Image Captioning |
RubiCap:一种基于规则引导的强化学习方法,用于密集图像描述生成。 |
reinforcement learning distillation |
|
|
| 16 |
WS-Net: Weak-Signal Representation Learning and Gated Abundance Reconstruction for Hyperspectral Unmixing via State-Space and Weak Signal Attention Fusion |
提出WS-Net以解决弱信号超光谱解混合问题 |
Mamba representation learning |
|
|
| 17 |
RAE-NWM: Navigation World Model in Dense Visual Representation Space |
提出RAE-NWM以解决视觉导航中的状态演变问题 |
world model |
|
|
| 18 |
M3GCLR: Multi-View Mini-Max Infinite Skeleton-Data Game Contrastive Learning For Skeleton-Based Action Recognition |
提出M3GCLR框架,通过多视角对抗对比学习提升骨骼动作识别精度。 |
contrastive learning |
|
|
| 19 |
ForgeDreamer: Industrial Text-to-3D Generation with Multi-Expert LoRA and Cross-View Hypergraph |
ForgeDreamer提出多专家LoRA与跨视角超图,解决工业级文本到3D生成难题。 |
dreamer |
|
|
| 20 |
IntroSVG: Learning from Rendering Feedback for Text-to-SVG Generation via an Introspective Generator-Critic Framework |
提出IntroSVG框架,通过生成器-评论家自省学习提升文本到SVG的生成质量。 |
DPO direct preference optimization |
|
|
| 21 |
RTFDNet: Fusion-Decoupling for Robust RGB-T Segmentation |
提出RTFDNet,通过融合解耦实现鲁棒的RGB-T语义分割 |
teacher-student distillation |
✅ |
|