cs.CV（2024-05-18）

📊 共 9 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (3 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (2) 支柱二：RL算法与架构 (RL & Architecture) (2 🔗2) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Dusk Till Dawn: Self-supervised Nighttime Stereo Depth Estimation using Visual Foundation Models	提出基于视觉基础模型的自监督夜间立体深度估计方法，提升弱光环境下的深度预测精度。	depth estimation stereo depth foundation model	✅
2	MotionGS : Compact Gaussian Splatting SLAM by Motion Filter	MotionGS：基于运动滤波的紧凑型高斯溅射SLAM	3D gaussian splatting 3DGS gaussian splatting
3	GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition	提出GestFormer，一种基于多尺度小波池化Transformer的动态手势识别网络。	optical flow multimodal	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
4	EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging	EyeFound：用于眼科影像的多模态通用基础模型	foundation model multimodal
5	Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions	提出CARA框架，解决VLU基准测试中上下文不足导致的幻觉问题，提升模型可靠性。	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Automated Multi-level Preference for MLLMs	提出AMP框架，通过自动化多级偏好学习提升多模态大语言模型性能，减少幻觉。	reinforcement learning preference learning RLHF	✅
7	Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching	提出轨迹分数匹配(TSM)方法，解决文本到3D生成中伪真值不一致问题，并提升高分辨率生成效果。	dreamer distillation 3D gaussian splatting	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion	提出Motion Avatar，通过文本查询生成可定制的人和动物3D动态化身。	motion generation	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Towards Gradient-based Time-Series Explanations through a SpatioTemporal Attention Network	提出基于时空注意力网络的梯度时间序列解释方法，用于医学活动视频关键帧识别。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页