cs.CV（2024-05-09）

📊 共 12 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (5 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1 🔗1) 支柱一：机器人控制 (Robot Control) (1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Rethinking Efficient and Effective Point-based Networks for Event Camera Classification and Regression: EventMamba	EventMamba：针对事件相机数据，提出高效且有效的基于点云网络的分类与回归方法	Mamba SSM state space model	✅
2	Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers	Lumina-T2X：提出基于流的大型扩散Transformer，实现文本到任意模态、分辨率和时长的生成。	flow matching multimodal
3	DDPM-MoCo: Advancing Industrial Surface Defect Generation and Detection with Generative and Contrastive Learning	DDPM-MoCo：结合生成对抗学习，提升工业表面缺陷生成与检测效果	contrastive learning
4	DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction	DTCLMapper：用于矢量化高清地图构建的双重时序一致性学习方法	contrastive learning scene understanding	✅
5	Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition	提出对称叠加建模的自监督预训练方法，提升场景文本识别性能。	SSM contrastive learning	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference	提出视觉令牌撤回(VTW)模块，加速多模态大语言模型推理。	large language model multimodal
7	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts	提出CuMo以提升多模态大语言模型的性能	large language model multimodal instruction following	✅
8	Enhanced Multimodal Content Moderation of Children's Videos using Audiovisual Fusion	提出基于视听融合的CLIP改进方法，增强儿童视频内容审核能力	multimodal
9	Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media	提出相似性引导的多模态融合Transformer，用于社交媒体语义位置预测	multimodal

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
10	A Mixture of Experts Approach to 3D Human Motion Prediction	提出基于混合专家模型的3D人体运动预测方法，加速实时推理。	human motion human motion prediction motion prediction	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control	提出基于预训练文本到图像扩散模型的稳定控制表征，提升具身智能体的控制能力。	manipulation representation learning scene understanding

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera	提出基于虚拟相机的自由移动物体单目视频三维重建与姿态估计方法	egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页