cs.CV(2025-01-26)

📊 共 10 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (4 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
1 Scaling Large Vision-Language Models for Enhanced Multimodal Comprehension In Biomedical Image Analysis 微调LLaVA模型,提升生物医学图像分析中多模态理解能力 distillation large language model multimodal
2 CD-Lamba: Boosting Remote Sensing Change Detection via a Cross-Temporal Locally Adaptive State Space Model 提出CD-Lamba,通过跨时序局部自适应状态空间模型提升遥感图像变化检测性能。 Mamba SSM state space model
3 MimicGait: A Model Agnostic approach for Occluded Gait Recognition using Correlational Knowledge Distillation MimicGait:一种模型无关的关联知识蒸馏方法,用于解决遮挡下的步态识别问题。 distillation
4 Visual Generation Without Guidance 提出 Guidance-Free Training (GFT),无需引导即可实现高性能视觉生成,降低计算成本。 distillation classifier-free guidance

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
5 GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting 提出GaussianToken,利用2D高斯溅射增强图像Tokenizer的表征能力。 gaussian splatting splatting
6 TinyLLaVA-Video: Towards Smaller LMMs for Video Understanding with Group Resampler 提出TinyLLaVA-Video,利用分组重采样器实现轻量级视频理解,性能超越7B模型。 scene understanding multimodal
7 Can Pose Transfer Models Generate Realistic Human Motion? 姿态迁移模型在生成逼真人体运动方面仍有挑战,动作识别准确率有待提高 splatting

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
8 An Explainable Biomedical Foundation Model via Large-Scale Concept-Enhanced Vision-Language Pre-training 提出ConceptCLIP,首个可解释的生物医学基础模型,提升诊断准确率并提供可解释性。 foundation model multimodal
9 Ocean-OCR: Towards General OCR Application via a Vision-Language Model Ocean-OCR:通过视觉-语言模型实现通用OCR应用 large language model multimodal
10 SedarEval: Automated Evaluation using Self-Adaptive Rubrics SedarEval:提出基于自适应评分细则的自动化评测方法,提升LLM评测的精度和稳定性。 large language model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页