cs.CV(2024-05-06)

📊 共 23 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (6) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱五:交互与反应 (Interaction & Reaction) (3) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
1 MoDiPO: text-to-motion alignment via AI-feedback-driven Direct Preference Optimization MoDiPO:通过AI反馈驱动的直接偏好优化实现文本到动作的对齐 DPO direct preference optimization motion diffusion
2 WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning 提出WorldQA数据集,通过长链推理探索视频多模态世界知识理解 world model large language model multimodal
3 MemoryMamba: Memory-Augmented State Space Model for Defect Recognition 提出MemoryMamba,一种内存增强的状态空间模型,用于解决工业缺陷识别问题。 Mamba SSM state space model
4 Retinexmamba: Retinex-based Mamba for Low-light Image Enhancement RetinexMamba:一种基于Retinex理论和Mamba架构的低光照图像增强方法 Mamba SSM state space model
5 GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding GeoContrastNet:一种用于语言无关文档理解的对比键值边缘学习方法 contrastive learning spatial relationship
6 Statistical Edge Detection And UDF Learning For Shape Representation 提出基于统计边缘检测的UDF学习方法,提升神经距离函数对3D形状的表征精度 representation learning implicit representation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
7 A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose 提出基于3D高斯溅射的构造-优化方法,解决无相机位姿的稀疏视角合成问题。 depth estimation monocular depth 3D gaussian splatting
8 Gaussian Splatting: 3D Reconstruction and Novel View Synthesis, a Review 综述高斯溅射技术,用于三维重建和新视角合成 gaussian splatting splatting
9 Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration 提出神经图地图,通过高效的闭环集成实现密集建图 visual SLAM
10 Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation 提出基于微分同胚模板配准的大气湍流缓解方法 optical flow
11 Hierarchical Space-Time Attention for Micro-Expression Recognition 提出层级时空注意力网络HSTA,用于提升微表情识别精度 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
12 Foundation Models for Video Understanding: A Survey 提出视频基础模型以解决视频理解任务的挑战 foundation model
13 Advancing Multimodal Medical Capabilities of Gemini Med-Gemini:基于Gemini的多模态医学大模型,提升多种医学任务性能 multimodal
14 Research on Image Recognition Technology Based on Multimodal Deep Learning 提出一种基于多模态深度学习的人体行为识别算法,提升视频中行人行为检测精度。 multimodal
15 Language-Image Models with 3D Understanding 提出Cube-LLM,通过大规模预训练实现语言-图像模型对3D场景的理解与推理。 large language model chain-of-thought
16 Modality Prompts for Arbitrary Modality Salient Object Detection 提出基于模态提示的模态自适应Transformer,用于任意模态显著性目标检测。 multimodal

🔬 支柱五:交互与反应 (Interaction & Reaction) (3 篇)

#题目一句话要点标签🔗
17 Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning 提出基于任务内互注意力的Vision Transformer用于小样本学习,提升分类精度。 mutual attention
18 Pose Priors from Language Models 利用语言模型作为先验,实现更准确的三维人体姿态估计 two-person interaction multimodal
19 Dual Relation Mining Network for Zero-Shot Learning 提出双重关系挖掘网络DRMN,解决零样本学习中视觉语义关系建模不足的问题。 interaction transformer

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
20 AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding AniTalker:通过解耦身份的面部运动编码生成生动多样的说话人脸 motion representation
21 Spatial and Surface Correspondence Field for Interaction Transfer 提出一种空间与表面对应场方法,用于交互迁移任务,提升迁移准确性和有效性。 spatial relationship

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
22 LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model 提出LGTM:一种局部到全局的文本驱动人体运动扩散模型,提升语义一致性。 motion diffusion model motion diffusion text-to-motion

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
23 Enhancing Spatiotemporal Disease Progression Models via Latent Diffusion and Prior Knowledge 提出基于潜在扩散和先验知识的BrLP模型,提升脑部疾病时空进展预测精度。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页