cs.CV(2024-05-09)

📊 共 12 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (5 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
1 Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba EventMamba:针对事件相机数据,提出高效且有效的基于点云网络的分类与回归方法 Mamba SSM state space model
2 Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers Lumina-T2X:提出基于流的大型扩散Transformer,实现文本到任意模态、分辨率和时长的生成。 flow matching multimodal
3 DDPM-MoCo: Advancing Industrial Surface Defect Generation and Detection with Generative and Contrastive Learning DDPM-MoCo:结合生成对抗学习,提升工业表面缺陷生成与检测效果 contrastive learning
4 DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction DTCLMapper:用于矢量化高清地图构建的双重时序一致性学习方法 contrastive learning scene understanding
5 Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition 提出对称叠加建模的自监督预训练方法,提升场景文本识别性能。 SSM contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
6 Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference 提出视觉令牌撤回(VTW)模块,加速多模态大语言模型推理。 large language model multimodal
7 CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts 提出CuMo以提升多模态大语言模型的性能 large language model multimodal instruction following
8 Enhanced Multimodal Content Moderation of Children's Videos using Audiovisual Fusion 提出基于视听融合的CLIP改进方法,增强儿童视频内容审核能力 multimodal
9 Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media 提出相似性引导的多模态融合Transformer,用于社交媒体语义位置预测 multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
10 A Mixture of Experts Approach to 3D Human Motion Prediction 提出基于混合专家模型的3D人体运动预测方法,加速实时推理。 human motion human motion prediction motion prediction

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
11 Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control 提出基于预训练文本到图像扩散模型的稳定控制表征,提升具身智能体的控制能力。 manipulation representation learning scene understanding

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
12 Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera 提出基于虚拟相机的自由移动物体单目视频三维重建与姿态估计方法 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页