cs.CV(2024-06-21)
📊 共 15 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (4)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Open-Vocabulary Temporal Action Localization using Multimodal Guidance | 提出OVFormer,利用多模态指导实现开放词汇时序动作定位 | open-vocabulary open vocabulary large language model | ||
| 2 | E2GS: Event Enhanced Gaussian Splatting | 提出E2GS,利用事件相机数据增强高斯溅射,实现快速高质量的新视角合成。 | gaussian splatting splatting NeRF | ✅ | |
| 3 | Taming 3DGS: High-Quality Radiance Fields with Limited Resources | 提出预算约束下的3DGS优化方法,实现高质量、低资源占用率的新视角合成。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 4 | Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning | 提出多模态任务向量,解决多模态大模型长程上下文学习问题 | implicit representation multimodal | ✅ | |
| 5 | Relighting Scenes with Object Insertions in Neural Radiance Fields | 提出基于NeRF的物体插入与光照重定向方法,实现逼真的AR场景合成 | NeRF neural radiance field |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | Multimodal Deformable Image Registration for Long-COVID Analysis Based on Progressive Alignment and Multi-perspective Loss | 提出基于渐进对齐和多视角损失的多模态可变形图像配准方法,用于长新冠分析。 | multimodal | ||
| 7 | Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | 提出SpatialEval基准,揭示VLM在空间推理能力上的不足与反直觉现象。 | large language model multimodal | ||
| 8 | TraceNet: Segment one thing efficiently | TraceNet:高效单实例分割,通过用户点击驱动,专为移动端成像应用设计 | multimodal | ||
| 9 | Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis | 提出不确定性校准融合网络UFNet,用于家庭场景下帕金森病的辅助检测 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | CLIP-Decoder : ZeroShot Multilabel Classification using Multimodal CLIP Aligned Representation | 提出CLIP-Decoder,利用多模态对齐表征实现零样本多标签分类 | representation learning multimodal | ||
| 11 | VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation | VividDreamer:提出姿态依赖一致性蒸馏采样,实现高质量高效的文本到3D生成 | dreamer distillation | ✅ | |
| 12 | VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation | VideoScore:构建自动视频评估指标,模拟人类反馈以提升视频生成质量 | reinforcement learning RLHF |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT | 提出一种端到端、无分割的阿拉伯语手写识别模型,并在KHATT数据集上验证。 | manipulation | ||
| 14 | Landscape More Secure Than Portrait? Zooming Into the Directionality of Digital Images With Security Implications | 揭示图像方向性对媒体安全的影响,并提出改进方法 | manipulation |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN | 提出基于骨骼数据融合和多流CNN的实时手势识别框架 | spatiotemporal |