cs.CV(2024-08-04)
📊 共 12 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (4 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱三:空间感知与语义 (Perception & Semantics) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid | Mini-Monkey提出互补图像金字塔,缓解轻量级MLLM中的语义锯齿效应 | large language model multimodal | ✅ | |
| 2 | Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models | 针对指令微调,综述数据评估与选择方法以提升大语言模型性能。 | large language model | ✅ | |
| 3 | Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models | 提出自省解码(SID)方法,缓解大型视觉语言模型中的幻觉问题 | multimodal | ||
| 4 | CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization | CACE-Net:协同引导注意力和对比增强用于有效视听事件定位 | multimodal | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | DeMansia: Mamba Never Forgets Any Tokens | 提出DeMansia,结合状态空间模型与Token标签,提升图像分类长序列处理能力。 | Mamba state space model | ✅ | |
| 6 | MoReFun: Past-Movement Guided Motion Representation Learning for Future Motion Prediction and Understanding | 提出MoReFun,通过过去运动引导的运动表征学习,提升未来人体运动预测与理解能力。 | representation learning | ✅ | |
| 7 | LEGO: Self-Supervised Representation Learning for Scene Text Images | 提出LEGO:一种面向场景文本图像的自监督表征学习方法 | representation learning | ||
| 8 | Unsupervised Representation Learning by Balanced Self Attention Matching | 提出基于平衡自注意力匹配的无监督表征学习方法BAM,避免特征坍塌。 | representation learning |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance | 用户闭环评估多模态LLM在活动辅助中的应用,Socratic模型表现更优 | egocentric large language model multimodal |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving | 提出KAN-RCBEVDepth以解决自动驾驶中的3D物体检测问题 | spatial relationship multimodal | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos | AvatarPose:利用个性化Avatar先验,解决稀疏多视角下近距离交互人体三维姿态估计难题 | penetration |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | PanicleNeRF: low-cost, high-precision in-field phenotypingof rice panicles with smartphone | PanicleNeRF:利用智能手机低成本、高精度地进行水稻穗田间表型分析 | NeRF |