cs.CV(2024-04-07)

📊 共 20 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF 提出GauU-Scene V2以评估图像度量的可靠性 3DGS gaussian splatting splatting
2 MemFlow: Optical Flow Estimation and Prediction with Memory 提出MemFlow以解决光流估计与预测中的实时性问题 optical flow
3 Hyperbolic Learning with Synthetic Captions for Open-World Detection 提出超曲率学习与合成字幕以解决开放世界检测问题 open-vocabulary open vocabulary
4 CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis 提出CodecNeRF以解决NeRF表示的编码解码效率问题 NeRF neural radiance field
5 Dual-Camera Smooth Zoom on Mobile Phones 提出双摄像头平滑变焦方法以解决手机变焦体验问题 gaussian splatting splatting
6 NeRF2Points: Large-Scale Point Cloud Generation From Street Views' Radiance Field Optimization 提出NeRF2Points以解决街景数据点云生成问题 NeRF neural radiance field
7 Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer 提出CONTHO以解决3D人类与物体联合重建问题 3D reconstruction

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
8 GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling 提出GenEARL以解决多模态事件论元角色标注问题 large language model multimodal
9 DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology 提出DinoBloom以解决血液学中细胞嵌入泛化问题 foundation model
10 X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model 提出X-VARS以解决足球裁判决策可解释性问题 large language model
11 Facial Affective Behavior Analysis with Instruction Tuning 提出面部情感行为分析新方法以解决数据稀缺问题 large language model instruction following
12 Mixture of Low-rank Experts for Transferable AI-Generated Image Detection 提出低秩专家混合模型以解决AI生成图像检测问题 zero-shot transfer

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
13 DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models 提出DREAM以解决视频文本检索中的数据表示不足问题 representation learning large language model foundation model
14 VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module 提出VMambaMorph以解决多模态医学图像配准问题 Mamba SSM state space model
15 A Clinical-oriented Multi-level Contrastive Learning Method for Disease Diagnosis in Low-quality Medical Images 提出临床导向的多层对比学习方法以解决低质量医学图像中的疾病诊断问题 representation learning contrastive learning
16 FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback 提出FGAIF以解决视觉语言模型的对齐问题 reinforcement learning PPO

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
17 A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals 提出S²Fusion以解决稀疏信号下的人体运动估计问题 motion tracking penetration scene-aware motion
18 AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement 提出AUEditNet以解决面部动作单元强度操控问题 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
19 Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind 提出LMK方法以解决动态物体的3D跟踪问题 egocentric
20 UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection 提出UniMD以统一时序动作检测与时刻检索问题 Ego4D

⬅️ 返回 cs.CV 首页 · 🏠 返回主页