cs.CV(2024-06-21)

📊 共 15 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (4) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
1 Open-Vocabulary Temporal Action Localization using Multimodal Guidance 提出OVFormer,利用多模态指导实现开放词汇时序动作定位 open-vocabulary open vocabulary large language model
2 E2GS: Event Enhanced Gaussian Splatting 提出E2GS,利用事件相机数据增强高斯溅射,实现快速高质量的新视角合成。 gaussian splatting splatting NeRF
3 Taming 3DGS: High-Quality Radiance Fields with Limited Resources 提出预算约束下的3DGS优化方法,实现高质量、低资源占用率的新视角合成。 3D gaussian splatting 3DGS gaussian splatting
4 Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning 提出多模态任务向量,解决多模态大模型长程上下文学习问题 implicit representation multimodal
5 Relighting Scenes with Object Insertions in Neural Radiance Fields 提出基于NeRF的物体插入与光照重定向方法,实现逼真的AR场景合成 NeRF neural radiance field

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
6 Multimodal Deformable Image Registration for Long-COVID Analysis Based on Progressive Alignment and Multi-perspective Loss 提出基于渐进对齐和多视角损失的多模态可变形图像配准方法,用于长新冠分析。 multimodal
7 Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models 提出SpatialEval基准,揭示VLM在空间推理能力上的不足与反直觉现象。 large language model multimodal
8 TraceNet: Segment one thing efficiently TraceNet:高效单实例分割,通过用户点击驱动,专为移动端成像应用设计 multimodal
9 Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis 提出不确定性校准融合网络UFNet,用于家庭场景下帕金森病的辅助检测 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
10 CLIP-Decoder : ZeroShot Multilabel Classification using Multimodal CLIP Aligned Representation 提出CLIP-Decoder,利用多模态对齐表征实现零样本多标签分类 representation learning multimodal
11 VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation VividDreamer:提出姿态依赖一致性蒸馏采样,实现高质量高效的文本到3D生成 dreamer distillation
12 VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation VideoScore:构建自动视频评估指标,模拟人类反馈以提升视频生成质量 reinforcement learning RLHF

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
13 An End-to-End, Segmentation-Free, Arabic Handwritten Recognition Model on KHATT 提出一种端到端、无分割的阿拉伯语手写识别模型,并在KHATT数据集上验证。 manipulation
14 Landscape More Secure Than Portrait? Zooming Into the Directionality of Digital Images With Security Implications 揭示图像方向性对媒体安全的影响,并提出改进方法 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
15 Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN 提出基于骨骼数据融合和多流CNN的实时手势识别框架 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页