cs.CV(2024-04-07)
📊 共 20 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (4)
支柱一:机器人控制 (Robot Control) (2)
支柱六:视频提取与匹配 (Video Extraction) (2 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF | 提出GauU-Scene V2以评估图像度量的可靠性 | 3DGS gaussian splatting splatting | ||
| 2 | MemFlow: Optical Flow Estimation and Prediction with Memory | 提出MemFlow以解决光流估计与预测中的实时性问题 | optical flow | ✅ | |
| 3 | Hyperbolic Learning with Synthetic Captions for Open-World Detection | 提出超曲率学习与合成字幕以解决开放世界检测问题 | open-vocabulary open vocabulary | ||
| 4 | CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis | 提出CodecNeRF以解决NeRF表示的编码解码效率问题 | NeRF neural radiance field | ||
| 5 | Dual-Camera Smooth Zoom on Mobile Phones | 提出双摄像头平滑变焦方法以解决手机变焦体验问题 | gaussian splatting splatting | ✅ | |
| 6 | NeRF2Points: Large-Scale Point Cloud Generation From Street Views' Radiance Field Optimization | 提出NeRF2Points以解决街景数据点云生成问题 | NeRF neural radiance field | ||
| 7 | Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer | 提出CONTHO以解决3D人类与物体联合重建问题 | 3D reconstruction | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling | 提出GenEARL以解决多模态事件论元角色标注问题 | large language model multimodal | ||
| 9 | DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology | 提出DinoBloom以解决血液学中细胞嵌入泛化问题 | foundation model | ||
| 10 | X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model | 提出X-VARS以解决足球裁判决策可解释性问题 | large language model | ||
| 11 | Facial Affective Behavior Analysis with Instruction Tuning | 提出面部情感行为分析新方法以解决数据稀缺问题 | large language model instruction following | ||
| 12 | Mixture of Low-rank Experts for Transferable AI-Generated Image Detection | 提出低秩专家混合模型以解决AI生成图像检测问题 | zero-shot transfer | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models | 提出DREAM以解决视频文本检索中的数据表示不足问题 | representation learning large language model foundation model | ||
| 14 | VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module | 提出VMambaMorph以解决多模态医学图像配准问题 | Mamba SSM state space model | ||
| 15 | A Clinical-oriented Multi-level Contrastive Learning Method for Disease Diagnosis in Low-quality Medical Images | 提出临床导向的多层对比学习方法以解决低质量医学图像中的疾病诊断问题 | representation learning contrastive learning | ||
| 16 | FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback | 提出FGAIF以解决视觉语言模型的对齐问题 | reinforcement learning PPO |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals | 提出S²Fusion以解决稀疏信号下的人体运动估计问题 | motion tracking penetration scene-aware motion | ||
| 18 | AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement | 提出AUEditNet以解决面部动作单元强度操控问题 | manipulation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind | 提出LMK方法以解决动态物体的3D跟踪问题 | egocentric | ||
| 20 | UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection | 提出UniMD以统一时序动作检测与时刻检索问题 | Ego4D | ✅ |