| 12 |
MambaVSR: Content-Aware Scanning State Space Model for Video Super-Resolution |
提出MambaVSR以解决视频超分辨率中的非局部依赖建模问题 |
Mamba SSM state space model |
|
|
| 13 |
How Visual Representations Map to Language Feature Space in Multimodal LLMs |
提出冻结模型与线性适配器以解决视觉与语言对齐问题 |
representation learning large language model multimodal |
|
|
| 14 |
InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation |
提出InceptionMamba以解决显微医学图像分割效率问题 |
Mamba state space model |
|
|
| 15 |
AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments |
提出AgentSense以解决智能家居中缺乏多样化标注数据的问题 |
world model embodied AI large language model |
✅ |
|
| 16 |
Stop learning it all to mitigate visual hallucination, Focus on the hallucination target |
提出偏好学习方法以缓解多模态大语言模型的视觉幻觉问题 |
preference learning large language model multimodal |
|
|
| 17 |
Self-supervised Learning of Echocardiographic Video Representations via Online Cluster Distillation |
提出DISCOVR以解决心脏超声视频表示学习问题 |
representation learning distillation |
✅ |
|
| 18 |
DAVID-XR1: Detecting AI-Generated Videos with Explainable Reasoning |
提出DAVID-XR1以解决AI生成视频检测的可解释性问题 |
distillation chain-of-thought |
|
|
| 19 |
EasyARC: Evaluating Vision Language Models on True Visual Reasoning |
提出EasyARC以解决多模态视觉推理评估问题 |
reinforcement learning multimodal |
|
|
| 20 |
Auto-Connect: Connectivity-Preserving RigFormer with Direct Preference Optimization |
提出Auto-Connect以解决自动绑定中骨骼连通性问题 |
direct preference optimization |
|
|