cs.CV（2024-08-04）

📊 共 12 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (4 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗2) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱七：动作重定向 (Motion Retargeting) (1 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱三：空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Mini-Monkey: Alleviating the Semantic Sawtooth Effect for Lightweight MLLMs via Complementary Image Pyramid	Mini-Monkey提出互补图像金字塔，缓解轻量级MLLM中的语义锯齿效应	large language model multimodal	✅
2	Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models	针对指令微调，综述数据评估与选择方法以提升大语言模型性能。	large language model	✅
3	Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models	提出自省解码（SID）方法，缓解大型视觉语言模型中的幻觉问题	multimodal
4	CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization	CACE-Net：协同引导注意力和对比增强用于有效视听事件定位	multimodal	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
5	DeMansia: Mamba Never Forgets Any Tokens	提出DeMansia，结合状态空间模型与Token标签，提升图像分类长序列处理能力。	Mamba state space model	✅
6	MoReFun: Past-Movement Guided Motion Representation Learning for Future Motion Prediction and Understanding	提出MoReFun，通过过去运动引导的运动表征学习，提升未来人体运动预测与理解能力。	representation learning	✅
7	LEGO: Self-Supervised Representation Learning for Scene Text Images	提出LEGO：一种面向场景文本图像的自监督表征学习方法	representation learning
8	Unsupervised Representation Learning by Balanced Self Attention Matching	提出基于平衡自注意力匹配的无监督表征学习方法BAM，避免特征坍塌。	representation learning

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
9	User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance	用户闭环评估多模态LLM在活动辅助中的应用，Socratic模型表现更优	egocentric large language model multimodal

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
10	KAN-RCBEVDepth: A multi-modal fusion algorithm in object detection for autonomous driving	提出KAN-RCBEVDepth以解决自动驾驶中的3D物体检测问题	spatial relationship multimodal	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
11	AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos	AvatarPose：利用个性化Avatar先验，解决稀疏多视角下近距离交互人体三维姿态估计难题	penetration

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
12	PanicleNeRF: low-cost, high-precision in-field phenotypingof rice panicles with smartphone	PanicleNeRF：利用智能手机低成本、高精度地进行水稻穗田间表型分析	NeRF

⬅️ 返回 cs.CV 首页 · 🏠 返回主页