cs.CV(2025-01-04)
📊 共 8 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱二:RL算法与架构 (RL & Architecture) (2)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | What Kind of Visual Tokens Do We Need? Training-free Visual Token Pruning for Multi-modal Large Language Models from the Perspective of Graph | 提出G-Prune以解决多模态大语言模型的视觉token冗余问题 | large language model multimodal | ||
| 2 | Generating Multimodal Images with GAN: Integrating Text, Image, and Style | 提出基于GAN的多模态图像生成方法,融合文本、图像和风格信息。 | multimodal | ||
| 3 | A Survey of State of the Art Large Vision Language Models: Alignment, Benchmark, Evaluations and Challenges | 对大型视觉语言模型(VLM)的对齐、基准、评估和挑战进行全面综述 | multimodal | ✅ | |
| 4 | Benchmarking Large and Small MLLMs | 系统性评测大小型多模态大语言模型,揭示能力边界与应用潜力 | multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Joint Optimization for 4D Human-Scene Reconstruction in the Wild | 提出JOSH,用于野外场景单目视频中的4D人体-场景联合重建 | scene reconstruction human-scene interaction human mesh recovery | ||
| 6 | From Images to Detection: Machine Learning for Blood Pattern Classification | 提出基于机器学习的血迹模式分类方法,用于区分枪击和撞击血迹,提升犯罪现场重建效率。 | scene reconstruction |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Hyperbolic Contrastive Learning for Hierarchical 3D Point Cloud Embedding | 提出基于双曲对比学习的层级3D点云嵌入方法,提升下游任务性能。 | contrastive learning | ||
| 8 | Distillation-Enhanced Physical Adversarial Attacks | 提出一种基于知识蒸馏的物理对抗攻击方法,提升隐蔽性和攻击性能。 | distillation |