cs.CV（2024-04-07）

📊 共 20 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (7 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4) 支柱一：机器人控制 (Robot Control) (2) 支柱六：视频提取与匹配 (Video Extraction) (2 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	GauU-Scene V2: Assessing the Reliability of Image-Based Metrics with Expansive Lidar Image Dataset Using 3DGS and NeRF	提出GauU-Scene V2以评估图像度量的可靠性	3DGS gaussian splatting splatting
2	MemFlow: Optical Flow Estimation and Prediction with Memory	提出MemFlow以解决光流估计与预测中的实时性问题	optical flow	✅
3	Hyperbolic Learning with Synthetic Captions for Open-World Detection	提出超曲率学习与合成字幕以解决开放世界检测问题	open-vocabulary open vocabulary
4	CodecNeRF: Toward Fast Encoding and Decoding, Compact, and High-quality Novel-view Synthesis	提出CodecNeRF以解决NeRF表示的编码解码效率问题	NeRF neural radiance field
5	Dual-Camera Smooth Zoom on Mobile Phones	提出双摄像头平滑变焦方法以解决手机变焦体验问题	gaussian splatting splatting	✅
6	NeRF2Points: Large-Scale Point Cloud Generation From Street Views' Radiance Field Optimization	提出NeRF2Points以解决街景数据点云生成问题	NeRF neural radiance field
7	Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer	提出CONTHO以解决3D人类与物体联合重建问题	3D reconstruction	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
8	GenEARL: A Training-Free Generative Framework for Multimodal Event Argument Role Labeling	提出GenEARL以解决多模态事件论元角色标注问题	large language model multimodal
9	DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology	提出DinoBloom以解决血液学中细胞嵌入泛化问题	foundation model
10	X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model	提出X-VARS以解决足球裁判决策可解释性问题	large language model
11	Facial Affective Behavior Analysis with Instruction Tuning	提出面部情感行为分析新方法以解决数据稀缺问题	large language model instruction following
12	Mixture of Low-rank Experts for Transferable AI-Generated Image Detection	提出低秩专家混合模型以解决AI生成图像检测问题	zero-shot transfer	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
13	DREAM: Improving Video-Text Retrieval Through Relevance-Based Augmentation Using Large Foundation Models	提出DREAM以解决视频文本检索中的数据表示不足问题	representation learning large language model foundation model
14	VMambaMorph: a Multi-Modality Deformable Image Registration Framework based on Visual State Space Model with Cross-Scan Module	提出VMambaMorph以解决多模态医学图像配准问题	Mamba SSM state space model
15	A Clinical-oriented Multi-level Contrastive Learning Method for Disease Diagnosis in Low-quality Medical Images	提出临床导向的多层对比学习方法以解决低质量医学图像中的疾病诊断问题	representation learning contrastive learning
16	FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback	提出FGAIF以解决视觉语言模型的对齐问题	reinforcement learning PPO

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
17	A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals	提出S²Fusion以解决稀疏信号下的人体运动估计问题	motion tracking penetration scene-aware motion
18	AUEditNet: Dual-Branch Facial Action Unit Intensity Manipulation with Implicit Disentanglement	提出AUEditNet以解决面部动作单元强度操控问题	manipulation

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind	提出LMK方法以解决动态物体的3D跟踪问题	egocentric
20	UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection	提出UniMD以统一时序动作检测与时刻检索问题	Ego4D	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页