cs.CV(2024-07-24)

📊 共 13 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (3)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 3D Gaussian Splatting: Survey, Technologies, Challenges, and Opportunities 对3D高斯溅射进行全面综述,分析技术、挑战与机遇,助力研究。 3D gaussian splatting 3DGS gaussian splatting
2 OVR: A Dataset for Open Vocabulary Temporal Repetition Counting in Videos 提出OVR数据集用于开放词汇视频时序重复计数,并提出基线模型OVRCounter。 open-vocabulary open vocabulary Ego4D
3 3D Question Answering for City Scene Understanding 提出Sg-CityU模型和City-3DQA数据集,用于城市级场景的3D多模态问答 scene understanding large language model multimodal
4 Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches 提出基于形状可变补丁的物理对抗攻击,提升单目深度估计的攻击有效性 depth estimation monocular depth
5 LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering LangOcc:基于体渲染的自监督开放词汇表占据估计 open-vocabulary open vocabulary
6 SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency SV4D:多帧多视角一致的动态3D内容生成 NeRF

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
7 ViPer: Visual Personalization of Generative Models via Individual Preference Learning ViPer:通过个体偏好学习实现生成模型的可视化个性化 preference learning large language model
8 Multi-label Cluster Discrimination for Visual Representation Learning 提出多标签聚类判别(MLCD)方法,提升视觉表征学习效果 representation learning contrastive learning
9 Unsqueeze [CLS] Bottleneck to Learn Rich Representations 提出UDI:一种解压缩[CLS]瓶颈的自监督学习方法,提升表征能力。 distillation multimodal
10 XMeCap: Meme Caption Generation with Sub-Image Adaptability XMeCap:一种具有子图像适应性的Meme字幕生成框架 reinforcement learning HuMoR

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
11 Nonverbal Immediacy Analysis in Education: A Multimodal Computational Model 提出一种多模态计算模型,用于分析教育场景中的非语言即时性。 multimodal
12 Diffusion Models For Multi-Modal Generative Modeling 提出统一的多模态扩散模型,实现多类型数据联合生成与建模。 multimodal
13 (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork 提出PASS:利用视觉提示和循环超网络寻找高效结构化稀疏 large language model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页