cs.CV(2024-07-26)

📊 共 15 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (5 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗2) 支柱一:机器人控制 (Robot Control) (2 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
1 Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models 提出基于图和多模态大语言模型的无监督解耦表示学习框架,解决语义因子间相关性问题。 DRL representation learning large language model
2 ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting ScalingGaussian:结合3D和2D扩散模型,提升高质量3D内容生成。 distillation gaussian splatting splatting
3 Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation 提出基于统一知识蒸馏的通用病理学基础模型,提升临床任务泛化性 representation learning distillation foundation model
4 VSSD: Vision Mamba with Non-Causal State Space Duality 提出VSSD:一种非因果状态空间对偶视觉Mamba模型,提升视觉任务性能。 Mamba SSM state space model
5 Modality-Balanced Learning for Multimedia Recommendation 提出反事实知识蒸馏以解决多模态推荐中的模态不平衡问题 distillation multimodal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
6 MangaUB: A Manga Understanding Benchmark for Large Multimodal Models MangaUB:一个用于评估大型多模态模型漫画理解能力的新基准 multimodal
7 Algorithm Research of ELMo Word Embedding and Deep Learning Multimodal Transformer in Image Description 提出ELMo-MCT的零样本医学图像描述算法,提升已知类别泛化性 multimodal
8 HICEScore: A Hierarchical Metric for Image Captioning Evaluation 提出HICEScore,一种用于图像描述评估的分层无参考指标,解决现有方法对局部幻觉和细粒度视觉信息敏感度不足的问题。 large language model multimodal
9 Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment 提出MARNet,利用扩散模型统一视觉和语义特征空间,增强跨模态对齐,提升图像分类鲁棒性。 multimodal
10 Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation 提出频谱分解Token学习框架,提升领域泛化语义分割性能 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
11 IOVS4NeRF:Incremental Optimal View Selection for Large-Scale NeRFs IOVS4NeRF:面向大规模NeRF的增量式最优视角选择方法 NeRF neural radiance field scene reconstruction
12 HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors 提出HybridDepth,融合景深与单图先验,实现鲁棒的度量深度估计。 depth estimation metric depth
13 Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations 提出EvINR,利用隐式神经表示自监督学习事件到视频的重建,无需光流估计。 optical flow spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
14 Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers 提出关系先验蒸馏(RPD)方法,提升点云跨域分类性能 sim-to-real distillation
15 Floating No More: Object-Ground Reconstruction from a Single Image 提出ORG模型,从单张图像重建对象与地面的3D几何关系,解决物体悬浮问题 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页