cs.CV(2024-08-31)

📊 共 16 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
1 RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning 提出RI-MAE,解决点云自监督学习中旋转不变性缺失问题。 representation learning masked autoencoder MAE
2 Aligning Medical Images with General Knowledge from Large Language Models 提出ViP框架,利用视觉症状引导提示学习,提升医学图像分析中CLIP模型的知识迁移能力 representation learning VIP large language model
3 Learning Co-Speech Gesture Representations in Dialogue through Contrastive Learning: An Intrinsic Evaluation 提出基于对比学习的对话中协同手势表征学习方法,提升手势相似度匹配。 representation learning contrastive learning multimodal
4 A Hybrid Transformer-Mamba Network for Single Image Deraining 提出TransMamba:一种用于单图像去雨的混合Transformer-Mamba网络 Mamba state space model
5 Compositional 3D-aware Video Generation with LLM Director 提出基于LLM导演的组合式3D感知视频生成方法,实现对视频内容更精细的控制。 distillation large language model
6 TrackSSM: A General Motion Predictor by State-Space Model 提出TrackSSM以解决多目标跟踪中的运动预测问题 Mamba SSM state space model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
7 Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability 评估并增强NASA-IBM Prithvi的领域适应性,用于地球空间图像分析 large language model foundation model
8 Digit Recognition using Multimodal Spiking Neural Networks 提出一种多模态脉冲神经网络,用于融合视觉和听觉信息以提升数字识别精度。 multimodal
9 Comparative Analysis of Modality Fusion Approaches for Audio-Visual Person Identification and Verification 对比语音-面部多模态融合策略,提升身份识别与验证精度。 multimodal
10 COSMo: CLIP Talks on Open-Set Multi-Target Domain Adaptation COSMo:提出一种基于CLIP的开放集多目标域自适应方法,解决视觉和语义特征的域偏移问题。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
11 3D Gaussian Splatting for Large-scale Surface Reconstruction from Aerial Images 提出AGS,解决3D高斯溅射在大规模航拍影像表面重建中的难题 3D gaussian splatting 3DGS gaussian splatting
12 UDGS-SLAM : UniDepth Assisted Gaussian Splatting for Monocular SLAM UDGS-SLAM:利用UniDepth辅助高斯溅射的单目SLAM depth estimation UniDepth gaussian splatting
13 EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System EgoHDM:一种在线的以自我为中心的惯性人体运动捕捉、定位和稠密建图系统 scene reconstruction elevation map physically plausible

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
14 StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models 提出StimuVAR,利用多模态大语言模型进行时空刺激感知的视频情感推理。 spatiotemporal large language model multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
15 Training-Free Sketch-Guided Diffusion with Latent Optimization 提出基于潜在空间优化的免训练草图引导扩散模型,实现精确图像生成控制 latent optimization

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 Data Augmentation for Image Classification using Generative AI 提出AGA框架,利用生成式AI进行图像分类数据增强,提升模型泛化性。 manipulation large language model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页