| 1 |
PointGS: Semantic-Consistent Unsupervised 3D Point Cloud Segmentation with 3D Gaussian Splatting |
PointGS:利用3D高斯溅射实现语义一致的无监督3D点云分割 |
contrastive learning 3D gaussian splatting gaussian splatting |
|
|
| 2 |
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture |
SenseNova-U1:基于NEO-unify架构的统一多模态理解与生成模型 |
world model world models vision-language-action |
|
|
| 3 |
PairDropGS: Paired Dropout-Induced Consistency Regularization for Sparse-View Gaussian Splatting |
提出PairDropGS以解决稀疏视图高斯点云重建不稳定问题 |
representation learning 3D gaussian splatting 3DGS |
|
|
| 4 |
Learn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Training |
VISTA:提出视觉感知自提升训练框架,提升多模态大语言模型的推理能力 |
preference learning large language model multimodal |
|
|
| 5 |
Selection, Not Fusion: Radar-Modulated State Space Models for Radar-Camera Depth Estimation |
提出雷达调制选择机制以解决雷达-相机深度估计问题 |
Mamba state space model MAE |
|
|
| 6 |
Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction |
Lite3R:一种模型无关的高效前馈3D重建框架,降低计算开销并保持精度。 |
linear attention teacher-student distillation |
✅ |
|
| 7 |
Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images |
提出PCSR-Bench基准,诊断MLLM在全景图像中视角条件下的空间推理能力 |
reward design reward shaping egocentric |
|
|
| 8 |
CaC: Advancing Video Reward Models via Hierarchical Spatiotemporal Concentrating |
提出基于视觉-语言模型的CaC框架,用于提升视频异常检测的准确性和可解释性。 |
reinforcement learning spatiotemporal chain-of-thought |
|
|
| 9 |
HorizonDrive: Self-Corrective Autoregressive World Model for Long-horizon Driving Simulation |
HorizonDrive:用于长时程驾驶模拟的自校正自回归世界模型 |
world model world models distillation |
|
|
| 10 |
TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles |
提出TCP-SSM,通过token条件极点改进视觉状态空间模型的效率与可解释性。 |
Mamba SSM state space model |
|
|
| 11 |
VIP: Visual-guided Prompt Evolution for Efficient Dense Vision-Language Inference |
提出VIP:视觉引导的Prompt进化方法,高效实现密集视觉-语言推理。 |
VIP distillation open-vocabulary |
✅ |
|
| 12 |
SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning |
提出SyncDPO以解决视频音频联合生成中的时间同步问题 |
preference learning DPO direct preference optimization |
✅ |
|
| 13 |
3D-Belief: Embodied Belief Inference via Generative 3D World Modeling |
提出3D-Belief,通过生成式3D世界建模实现具身信念推理。 |
world model world models |
|
|
| 14 |
Contrastive Learning under Noisy Temporal Self-Supervision for Colonoscopy Videos |
针对结肠镜视频,提出噪声感知的时序自监督对比学习方法 |
contrastive learning foundation model |
✅ |
|
| 15 |
Hide to See: Reasoning-prefix Masking for Visual-anchored Thinking in VLM Distillation |
提出基于推理前缀掩码的视觉锚定蒸馏方法,提升VLM在多模态推理中的视觉信息利用率。 |
distillation multimodal |
|
|
| 16 |
When Policy Entropy Constraint Fails: Preserving Diversity in Flow-based RLHF via Perceptual Entropy |
提出感知熵约束,解决Flow模型RLHF微调中多样性崩溃问题 |
flow matching RLHF |
✅ |
|
| 17 |
Interactive State Space Model with Cross-Modal Local Scanning for Depth Super-Resolution |
提出基于交互状态空间模型的跨模态局部扫描深度超分辨率方法 |
Mamba state space model |
|
|
| 18 |
The DAWN of World-Action Interactive Models |
提出WAIM以解决世界预测与动作生成的相互依赖问题 |
world model world models world action model |
|
|
| 19 |
FIS-DiT: Breaking the Few-Step Video Inference Barrier via Training-Free Frame Interleaved Sparsity |
提出FIS-DiT,通过无训练帧交错稀疏性突破视频扩散模型推理速度瓶颈。 |
predictive model distillation spatiotemporal |
|
|
| 20 |
Large-Small Model Collaboration for Farmland Semantic Change Detection |
提出大小模型协同框架,用于解决农田语义变化检测中的伪变化问题。 |
Mamba multimodal |
✅ |
|
| 21 |
Cross-Modal-Domain Generalization Through Semantically Aligned Discrete Representations |
提出CoDAAR,通过语义对齐的离散表示实现跨模态领域泛化 |
representation learning multimodal |
|
|
| 22 |
Learning Subspace-Preserving Sparse Attention Graphs from Heterogeneous Multiview Data |
提出SAGL方法,从异构多视图数据中学习保持子空间的稀疏注意力图,用于无监督迁移学习。 |
linear attention representation learning |
|
|
| 23 |
DORA: Dynamic Online Reinforcement Agent for Token Merging in Vision Transformers |
提出DORA:一种基于强化学习的ViT动态Token融合在线推理方法 |
reinforcement learning distillation |
|
|