cs.CV(2024-07-24)
📊 共 13 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (6)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (3)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | 3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities | 对3D高斯溅射进行全面综述,分析技术、挑战与机遇,助力研究。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 2 | OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos | 提出OVR数据集用于开放词汇视频时序重复计数,并提出基线模型OVRCounter。 | open-vocabulary open vocabulary Ego4D | ||
| 3 | 3D Question Answering for City Scene Understanding | 提出Sg-CityU模型和City-3DQA数据集,用于城市级场景的3D多模态问答 | scene understanding large language model multimodal | ||
| 4 | Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches | 提出基于形状可变补丁的物理对抗攻击,提升单目深度估计的攻击有效性 | depth estimation monocular depth | ||
| 5 | LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering | LangOcc:基于体渲染的自监督开放词汇表占据估计 | open-vocabulary open vocabulary | ||
| 6 | SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency | SV4D:多帧多视角一致的动态3D内容生成 | NeRF |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | ViPer:通过个体偏好学习实现生成模型的可视化个性化 | preference learning large language model | ||
| 8 | Multi-label Cluster Discrimination for Visual Representation Learning | 提出多标签聚类判别(MLCD)方法,提升视觉表征学习效果 | representation learning contrastive learning | ✅ | |
| 9 | Unsqueeze [CLS] Bottleneck to Learn Rich Representations | 提出UDI:一种解压缩[CLS]瓶颈的自监督学习方法,提升表征能力。 | distillation multimodal | ✅ | |
| 10 | XMeCap: Meme Caption Generation with Sub-Image Adaptability | XMeCap:一种具有子图像适应性的Meme字幕生成框架 | reinforcement learning HuMoR |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Nonverbal Immediacy Analysis in Education: A Multimodal Computational Model | 提出一种多模态计算模型,用于分析教育场景中的非语言即时性。 | multimodal | ||
| 12 | Diffusion Models For Multi-Modal Generative Modeling | 提出统一的多模态扩散模型,实现多类型数据联合生成与建模。 | multimodal | ||
| 13 | (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork | 提出PASS:利用视觉提示和循环超网络寻找高效结构化稀疏 | large language model |