cs.CV(2024-05-06)
📊 共 23 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2)
支柱五:交互与反应 (Interaction & Reaction) (3)
支柱七:动作重定向 (Motion Retargeting) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1 🔗1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization | MoDiPO:通过AI反馈驱动的直接偏好优化实现文本到动作的对齐 | DPO direct preference optimization motion diffusion | ||
| 2 | WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning | 提出WorldQA数据集,通过长链推理探索视频多模态世界知识理解 | world model large language model multimodal | ||
| 3 | MemoryMamba: Memory-Augmented State Space Model for Defect Recognition | 提出MemoryMamba,一种内存增强的状态空间模型,用于解决工业缺陷识别问题。 | Mamba SSM state space model | ||
| 4 | Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement | RetinexMamba:一种基于Retinex理论和Mamba架构的低光照图像增强方法 | Mamba SSM state space model | ||
| 5 | GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding | GeoContrastNet:一种用于语言无关文档理解的对比键值边缘学习方法 | contrastive learning spatial relationship | ||
| 6 | Statistical Edge Detection And UDF Learning For Shape Representation | 提出基于统计边缘检测的UDF学习方法,提升神经距离函数对3D形状的表征精度 | representation learning implicit representation |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose | 提出基于3D高斯溅射的构造-优化方法,解决无相机位姿的稀疏视角合成问题。 | depth estimation monocular depth 3D gaussian splatting | ✅ | |
| 8 | Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review | 综述高斯溅射技术,用于三维重建和新视角合成 | gaussian splatting splatting | ||
| 9 | Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration | 提出神经图地图,通过高效的闭环集成实现密集建图 | visual SLAM | ✅ | |
| 10 | Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation | 提出基于微分同胚模板配准的大气湍流缓解方法 | optical flow | ||
| 11 | Hierarchical Space-Time Attention for Micro-Expression Recognition | 提出层级时空注意力网络HSTA,用于提升微表情识别精度 | optical flow |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | Foundation Models for Video Understanding: A Survey | 提出视频基础模型以解决视频理解任务的挑战 | foundation model | ✅ | |
| 13 | Advancing Multimodal Medical Capabilities of Gemini | Med-Gemini:基于Gemini的多模态医学大模型,提升多种医学任务性能 | multimodal | ||
| 14 | Research on Image Recognition Technology Based on Multimodal Deep Learning | 提出一种基于多模态深度学习的人体行为识别算法,提升视频中行人行为检测精度。 | multimodal | ||
| 15 | Language-Image Models with 3D Understanding | 提出Cube-LLM,通过大规模预训练实现语言-图像模型对3D场景的理解与推理。 | large language model chain-of-thought | ✅ | |
| 16 | Modality Prompts for Arbitrary Modality Salient Object Detection | 提出基于模态提示的模态自适应Transformer,用于任意模态显著性目标检测。 | multimodal |
🔬 支柱五:交互与反应 (Interaction & Reaction) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning | 提出基于任务内互注意力的Vision Transformer用于小样本学习,提升分类精度。 | mutual attention | ||
| 18 | Pose Priors from Language Models | 利用语言模型作为先验,实现更准确的三维人体姿态估计 | two-person interaction multimodal | ||
| 19 | Dual Relation Mining Network for Zero-Shot Learning | 提出双重关系挖掘网络DRMN,解决零样本学习中视觉语义关系建模不足的问题。 | interaction transformer |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding | AniTalker:通过解耦身份的面部运动编码生成生动多样的说话人脸 | motion representation | ✅ | |
| 21 | Spatial and Surface Correspondence Field for Interaction Transfer | 提出一种空间与表面对应场方法,用于交互迁移任务,提升迁移准确性和有效性。 | spatial relationship |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model | 提出LGTM:一种局部到全局的文本驱动人体运动扩散模型,提升语义一致性。 | motion diffusion model motion diffusion text-to-motion | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | Enhancing Spatiotemporal Disease Progression Models via Latent Diffusion and Prior Knowledge | 提出基于潜在扩散和先验知识的BrLP模型,提升脑部疾病时空进展预测精度。 | spatiotemporal | ✅ |