cs.CV(2024-08-31)
📊 共 16 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (6 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (3)
支柱八:物理动画 (Physics-based Animation) (1 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning | 提出RI-MAE,解决点云自监督学习中旋转不变性缺失问题。 | representation learning masked autoencoder MAE | ✅ | |
| 2 | Aligning Medical Images with General Knowledge from Large Language Models | 提出ViP框架,利用视觉症状引导提示学习,提升医学图像分析中CLIP模型的知识迁移能力 | representation learning VIP large language model | ||
| 3 | Learning Co-Speech Gesture Representations in Dialogue through Contrastive Learning: An Intrinsic Evaluation | 提出基于对比学习的对话中协同手势表征学习方法,提升手势相似度匹配。 | representation learning contrastive learning multimodal | ||
| 4 | A Hybrid Transformer-Mamba Network for Single Image Deraining | 提出TransMamba:一种用于单图像去雨的混合Transformer-Mamba网络 | Mamba state space model | ||
| 5 | Compositional 3D-aware Video Generation with LLM Director | 提出基于LLM导演的组合式3D感知视频生成方法,实现对视频内容更精细的控制。 | distillation large language model | ||
| 6 | TrackSSM: A General Motion Predictor by State-Space Model | 提出TrackSSM以解决多目标跟踪中的运动预测问题 | Mamba SSM state space model | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability | 评估并增强NASA-IBM Prithvi的领域适应性,用于地球空间图像分析 | large language model foundation model | ||
| 8 | Digit Recognition using Multimodal Spiking Neural Networks | 提出一种多模态脉冲神经网络,用于融合视觉和听觉信息以提升数字识别精度。 | multimodal | ||
| 9 | Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification | 对比语音-面部多模态融合策略,提升身份识别与验证精度。 | multimodal | ||
| 10 | COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation | COSMo:提出一种基于CLIP的开放集多目标域自适应方法,解决视觉和语义特征的域偏移问题。 | foundation model | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | 3D Gaussian Splatting for Large-scale Surface Reconstruction from Aerial Images | 提出AGS,解决3D高斯溅射在大规模航拍影像表面重建中的难题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 12 | UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM | UDGS-SLAM:利用UniDepth辅助高斯溅射的单目SLAM | depth estimation UniDepth gaussian splatting | ||
| 13 | EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System | EgoHDM:一种在线的以自我为中心的惯性人体运动捕捉、定位和稠密建图系统 | scene reconstruction elevation map physically plausible |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models | 提出StimuVAR,利用多模态大语言模型进行时空刺激感知的视频情感推理。 | spatiotemporal large language model multimodal | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Training-Free Sketch-Guided Diffusion with Latent Optimization | 提出基于潜在空间优化的免训练草图引导扩散模型,实现精确图像生成控制 | latent optimization |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Data Augmentation for Image Classification using Generative AI | 提出AGA框架,利用生成式AI进行图像分类数据增强,提升模型泛化性。 | manipulation large language model |