cs.CV(2024-07-26)
📊 共 15 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (5 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (3 🔗2)
支柱一:机器人控制 (Robot Control) (2 🔗1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models | 提出基于图和多模态大语言模型的无监督解耦表示学习框架,解决语义因子间相关性问题。 | DRL representation learning large language model | ||
| 2 | ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting | ScalingGaussian:结合3D和2D扩散模型,提升高质量3D内容生成。 | distillation gaussian splatting splatting | ||
| 3 | Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation | 提出基于统一知识蒸馏的通用病理学基础模型,提升临床任务泛化性 | representation learning distillation foundation model | ||
| 4 | VSSD: Vision Mamba with Non-Causal State Space Duality | 提出VSSD:一种非因果状态空间对偶视觉Mamba模型,提升视觉任务性能。 | Mamba SSM state space model | ✅ | |
| 5 | Modality-Balanced Learning for Multimedia Recommendation | 提出反事实知识蒸馏以解决多模态推荐中的模态不平衡问题 | distillation multimodal | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | MangaUB: A Manga Understanding Benchmark for Large Multimodal Models | MangaUB:一个用于评估大型多模态模型漫画理解能力的新基准 | multimodal | ||
| 7 | Algorithm Research of ELMo Word Embedding and Deep Learning Multimodal Transformer in Image Description | 提出ELMo-MCT的零样本医学图像描述算法,提升已知类别泛化性 | multimodal | ||
| 8 | HICEScore: A Hierarchical Metric for Image Captioning Evaluation | 提出HICEScore,一种用于图像描述评估的分层无参考指标,解决现有方法对局部幻觉和细粒度视觉信息敏感度不足的问题。 | large language model multimodal | ✅ | |
| 9 | Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment | 提出MARNet,利用扩散模型统一视觉和语义特征空间,增强跨模态对齐,提升图像分类鲁棒性。 | multimodal | ||
| 10 | Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation | 提出频谱分解Token学习框架,提升领域泛化语义分割性能 | foundation model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | IOVS4NeRF:Incremental Optimal View Selection for Large-Scale NeRFs | IOVS4NeRF:面向大规模NeRF的增量式最优视角选择方法 | NeRF neural radiance field scene reconstruction | ||
| 12 | HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors | 提出HybridDepth,融合景深与单图先验,实现鲁棒的度量深度估计。 | depth estimation metric depth | ✅ | |
| 13 | Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations | 提出EvINR,利用隐式神经表示自监督学习事件到视频的重建,无需光流估计。 | optical flow spatiotemporal | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers | 提出关系先验蒸馏(RPD)方法,提升点云跨域分类性能 | sim-to-real distillation | ✅ | |
| 15 | Floating No More: Object-Ground Reconstruction from a Single Image | 提出ORG模型,从单张图像重建对象与地面的3D几何关系,解决物体悬浮问题 | manipulation |