cs.CV(2024-12-23)

📊 共 16 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 ChatGarment: Garment Estimation, Generation and Editing via Large Language Models ChatGarment:利用大型语言模型实现服装的估计、生成和编辑 large language model multimodal
2 A Multimodal Fusion Framework for Bridge Defect Detection with Cross-Verification 提出一种多模态融合框架,用于桥梁缺陷检测与交叉验证。 multimodal
3 Reasoning to Attend: Try to Understand How <SEG> Token Works 提出READ框架,通过语义相似性引导LMMs关注目标区域,提升视觉定位能力。 large language model multimodal visual grounding
4 EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities 提出EPE-P,解决多模态学习中缺失模态问题,提升参数效率和模型性能。 multimodal
5 SCBench: A Sports Commentary Benchmark for Video LLMs 提出SCBench:一个用于评估视频大语言模型在体育赛事解说生成任务上的基准。 large language model chain-of-thought
6 HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data HumanVBench:提出用于评估MLLM类人视频理解能力的合成基准数据集。 large language model multimodal
7 S-INF: Towards Realistic Indoor Scene Synthesis via Scene Implicit Neural Field 提出S-INF以解决室内场景合成中的多模态关系问题 multimodal
8 WildPPG: A Real-World PPG Dataset of Long Continuous Recordings WildPPG:发布长时连续PPG数据集,并提出更鲁棒的真实场景心率估计方法 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
9 CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction 提出CoSurfGS,一种基于分布式学习的大场景协同3D表面高斯溅射重建方法 3D gaussian splatting 3DGS gaussian splatting
10 V$^2$-SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy V$^2$-SfMLearner:融合振动信号的单目无线胶囊内窥镜深度与运动估计学习 monocular depth scene reconstruction multimodal
11 LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding LangSurf:用于3D场景理解的语言嵌入表面高斯表示 gaussian splatting splatting scene understanding
12 GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance GaussianPainter:提出一种基于法线引导的单次前向方法,将点云绘制为3D高斯模型。 3D gaussian splatting gaussian splatting splatting
13 Reconstructing People, Places, and Cameras 提出HSfM,联合重建多视角图像中的人体网格、场景点云和相机参数。 scene reconstruction spatial relationship
14 Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection Prova:一种用于大规模词汇目标检测的简单有效的多模态原型分类器 open-vocabulary open vocabulary

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
15 The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning 提出 CMT-MAE,通过协同掩码和目标提升掩码自编码器性能 representation learning masked autoencoder MAE

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
16 GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects GausSim:基于高斯核模拟弹性物体动态行为,预测真实世界 physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页