cs.CV(2024-08-04)

📊 共 12 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (4 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
1 Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid Mini-Monkey提出互补图像金字塔,缓解轻量级MLLM中的语义锯齿效应 large language model multimodal
2 Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models 针对指令微调,综述数据评估与选择方法以提升大语言模型性能。 large language model
3 Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models 提出自省解码(SID)方法,缓解大型视觉语言模型中的幻觉问题 multimodal
4 CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization CACE-Net:协同引导注意力和对比增强用于有效视听事件定位 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
5 DeMansia: Mamba Never Forgets Any Tokens 提出DeMansia,结合状态空间模型与Token标签,提升图像分类长序列处理能力。 Mamba state space model
6 MoReFun: Past-Movement Guided Motion Representation Learning for Future Motion Prediction and Understanding 提出MoReFun,通过过去运动引导的运动表征学习,提升未来人体运动预测与理解能力。 representation learning
7 LEGO: Self-Supervised Representation Learning for Scene Text Images 提出LEGO:一种面向场景文本图像的自监督表征学习方法 representation learning
8 Unsupervised Representation Learning by Balanced Self Attention Matching 提出基于平衡自注意力匹配的无监督表征学习方法BAM,避免特征坍塌。 representation learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
9 User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance 用户闭环评估多模态LLM在活动辅助中的应用,Socratic模型表现更优 egocentric large language model multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
10 KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving 提出KAN-RCBEVDepth以解决自动驾驶中的3D物体检测问题 spatial relationship multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
11 AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos AvatarPose:利用个性化Avatar先验,解决稀疏多视角下近距离交互人体三维姿态估计难题 penetration

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
12 PanicleNeRF: low-cost, high-precision in-field phenotypingof rice panicles with smartphone PanicleNeRF:利用智能手机低成本、高精度地进行水稻穗田间表型分析 NeRF

⬅️ 返回 cs.CV 首页 · 🏠 返回主页