| 1 |
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools |
提出IndusAgent,利用工具增强Agent解决开放词汇工业异常检测问题。 |
reinforcement learning open-vocabulary open vocabulary |
|
|
| 2 |
SpectralEarth-FM: Bringing Hyperspectral Imagery into Multimodal Earth Observation Pretraining |
提出SpectralEarth-FM,用于高光谱影像与多模态地球观测数据的联合预训练。 |
JEPA Joint-Embedding Predictive Architecture joint-embedding predictive architecture |
|
|
| 3 |
Multimodal LLMs under Pairwise Modalities |
提出基于模态对的多模态大语言模型训练框架,提升跨模态性能 |
representation learning contrastive learning large language model |
|
|
| 4 |
DriveMA: Rethinking Language Interfaces in Driving VLAs with One-Step Meta-Actions |
DriveMA:用单步元动作重塑驾驶VLA中的语言接口 |
reinforcement learning vision-language-action VLA |
|
|
| 5 |
Linear-DPO: Linear Direct Preference Optimization for Diffusion and Flow-Matching Generative Models |
提出Linear-DPO,通过线性效用函数优化扩散模型和Flow-Matching生成模型。 |
flow matching DPO direct preference optimization |
|
|
| 6 |
3D Reconstruction and Knowledge Distillation to Improve Multi-View Image Models to Explore Spike Volume Estimation in Wheat |
提出基于3D重建与知识蒸馏的多视角图像小麦穗体积估计方法 |
MAE distillation 3D reconstruction |
|
|
| 7 |
VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026 |
VISTA:用于Ego4D短时物体交互预测的V-JEPA集成时序预测器 |
JEPA human-object interaction egocentric |
✅ |
|
| 8 |
QwenSafe: Multimodal Content Rating Description Identification via Preference-Aligned VLMs |
QwenSafe:利用偏好对齐的视觉语言模型进行多模态内容分级描述识别 |
DPO direct preference optimization multimodal |
|
|
| 9 |
Distill to Think, Foresee to Act: Cognitive-Physical Reinforcement Learning for Autonomous Driving |
提出CoPhy认知-物理强化学习框架,提升自动驾驶安全性和意图理解。 |
reinforcement learning imitation learning world model |
|
|
| 10 |
ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection |
提出ProCrit框架,通过自激多视角推理和评论引导修正,提升多模态讽刺检测性能。 |
reinforcement learning multimodal |
|
|
| 11 |
RCGDet3D: Rethinking 4D Radar-Camera Fusion-based 3D Object Detection with Enhanced Radar Feature Encoding |
RCGDet3D:通过增强雷达特征编码,提升4D雷达-相机融合的3D目标检测性能 |
representation learning gaussian splatting splatting |
|
|
| 12 |
Deformba: Vision State Space Model with Adaptive State Fusion |
Deformba:基于自适应状态融合的视觉状态空间模型,提升视觉任务性能。 |
SSM state space model |
|
|
| 13 |
One-Step Distillation of Discrete Diffusion Image Generators via Fixed-Point Iteration |
提出固定点蒸馏(FPD)框架,实现离散扩散图像生成器单步高效蒸馏。 |
distillation |
|
|
| 14 |
Latent Dynamics for Full Body Avatar Animation |
提出基于Transformer和动态残差潜变量的全身Avatar动画方法,提升服装细节和时间连贯性。 |
latent dynamics |
|
|
| 15 |
GSA-YOLO: A High-Efficiency Framework via Structured Sparsity and Adaptive Knowledge Distillation for Real-Time X-ray Security Inspection |
GSA-YOLO:面向X射线安检的结构稀疏与自适应知识蒸馏高效框架 |
distillation |
|
|
| 16 |
Q-ARVD: Quantizing Autoregressive Video Diffusion Models |
Q-ARVD:提出一种新的量化框架,用于加速自回归视频扩散模型的推理。 |
world model world models |
|
|
| 17 |
JFAA: Technical Report for the EPIC-KITCHENS-100 Action Anticipation Challenge at EgoVis 2026 |
提出基于JEPA的JFAA方法,在EgoVis 2026的EK-100动作预测挑战赛中获得第一名 |
JEPA representation learning |
✅ |
|