cs.CV(2025-02-03)

📊 共 14 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models SPARC:多模态大语言模型中用于精细图像描述的选择性渐进式注意力重校准 large language model multimodal
2 Efficiently Integrate Large Language Models with Visual Perception: A Survey from the Training Paradigm Perspective 综述视觉-语言大模型训练范式,聚焦参数高效的模态融合方法 large language model multimodal
3 Foundation Model-Based Apple Ripeness and Size Estimation for Selective Harvesting 提出基于Foundation Model的苹果成熟度与尺寸估计方法,用于选择性采摘。 foundation model
4 Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models 利用大规模鲁棒图像编码器提升多模态大语言模型对抗攻击的鲁棒性 large language model
5 AdaSVD: Adaptive Singular Value Decomposition for Large Language Models 提出AdaSVD以解决大语言模型的压缩与性能问题 large language model
6 Language-to-Space Programming for Training-Free 3D Visual Grounding 提出LaSP,一种无需训练的3D视觉定位方法,提升效率与精度。 visual grounding
7 The in-context inductive biases of vision-language models differ across modalities 研究视觉-语言模型在不同模态下的上下文归纳偏置差异 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
8 UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping UVGS:利用UV映射重构非结构化3D高斯溅射,实现高效生成与编辑。 3D gaussian splatting 3DGS gaussian splatting
9 AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis 提出 AquaticCLIP 水下视觉-语言基础模型,用于水下场景分析。 scene understanding foundation model
10 FourieRF: Few-Shot NeRFs via Progressive Fourier Frequency Control FourieRF:通过渐进式傅里叶频率控制实现少样本NeRF高质量重建 NeRF foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
11 VisTA: Vision-Text Alignment Model with Contrastive Learning using Multimodal Data for Evidence-Driven, Reliable, and Explainable Alzheimer's Disease Diagnosis VisTA:利用多模态对比学习实现证据驱动、可靠且可解释的阿尔茨海默病诊断 contrastive learning multimodal
12 PolyhedronNet: Representation Learning for Polyhedra with Surface-attributed Graph 提出PolyhedronNet,通过表面属性图学习多面体表示,用于分类和检索。 representation learning
13 CleanPose: Category-Level Object Pose Estimation via Causal Learning and Knowledge Distillation CleanPose:利用因果学习和知识蒸馏实现类别级物体姿态估计 distillation

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
14 Learning Fine-to-Coarse Cuboid Shape Abstraction 提出一种由精细到粗糙的无监督学习方法,用于三维形状的立方体抽象。 humanoid

⬅️ 返回 cs.CV 首页 · 🏠 返回主页