cs.CV(2024-09-14)

📊 共 11 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱八:物理动画 (Physics-based Animation) (2 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
1 Keypoint-Integrated Instruction-Following Data Generation for Enhanced Human Pose and Action Understanding in Multimodal Models 提出关键点整合的指令跟随数据生成方法,提升多模态模型对人体姿态和行为的理解 multimodal instruction following
2 On the Generalizability of Foundation Models for Crop Type Mapping 评估遥感Foundation Model在作物类型mapping中的泛化能力与地理偏差 foundation model
3 AI-Driven Virtual Teacher for Enhanced Educational Efficiency: Leveraging Large Pretrain Models for Autonomous Error Analysis and Correction 提出VATE:利用大语言模型实现自主错误分析与纠正,提升教学效率 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
4 ManiDext: Hand-Object Manipulation Synthesis via Continuous Correspondence Embeddings and Residual-Guided Diffusion ManiDext:基于连续对应嵌入和残差引导扩散的手-物操作合成 manipulation dexterous manipulation bi-manual
5 ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild 提出ChildPlay-Hand数据集,用于研究真实场景下儿童与成人手部操作交互 manipulation HOI egocentric

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
6 Multimodal Power Outage Prediction for Rapid Disaster Response and Resource Allocation 提出VST-GNN模型,用于多模态电力中断预测,助力灾后快速响应和资源分配 spatiotemporal multimodal
7 MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals 提出MHAD多模态家庭活动数据集,用于提升视频生理信号分析在家庭环境中的性能。 PULSE multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
8 Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown 提出AED框架,通过关联一切检测结果,统一解决已知与未知类别多目标跟踪问题 open-vocabulary open vocabulary
9 Real-Time Stochastic Terrain Mapping and Processing for Autonomous Safe Landing 提出基于高斯过程回归的实时随机地形建模算法,用于自主安全着陆。 elevation map

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
10 Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval 评估预训练CNN和Foundation模型在医学图像检索中的特征提取性能 contrastive learning foundation model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
11 LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation 提出LawDNet以解决音频驱动的唇部合成问题 motion synthesis

⬅️ 返回 cs.CV 首页 · 🏠 返回主页