cs.CV(2024-11-30)

📊 共 15 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives Speedy-Splat:通过稀疏像素和图元加速3D高斯溅射渲染。 3D gaussian splatting gaussian splatting splatting
2 LineGS : 3D Line Segment Representation on 3D Gaussian Splatting LineGS:结合3D高斯溅射的3D线段表示,提升场景结构化重建精度 3D gaussian splatting gaussian splatting splatting
3 LMSeg: Unleashing the Power of Large-Scale Models for Open-Vocabulary Semantic Segmentation LMSeg:利用大规模模型进行开放词汇语义分割,提升细粒度视觉-语言对齐。 open-vocabulary open vocabulary large language model
4 Gaussians on their Way: Wasserstein-Constrained 4D Gaussian Splatting with State-Space Modeling 提出Wasserstein约束的4D高斯溅射,用于动态场景中平滑时序建模 gaussian splatting splatting physically plausible
5 Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding 提出Video-3D LLM,通过视频表示和位置编码增强LLM在3D场景理解中的能力 scene understanding large language model multimodal
6 GradiSeg: Gradient-Guided Gaussian Segmentation with Enhanced 3D Boundary Precision GradiSeg:梯度引导的高斯分割,提升3D边界精度 3D gaussian splatting 3DGS gaussian splatting
7 Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training 提出Holistic CLIP,通过多视角对比学习提升视觉-语言预训练的表达能力和泛化性 open-vocabulary open vocabulary
8 A conditional Generative Adversarial network model for the Weather4Cast 2024 Challenge 利用条件GAN进行降雨预测,在Weather4Cast 2024挑战赛中获得第一名 optical flow
9 Hybrid Local-Global Context Learning for Neural Video Compression 提出混合局部-全局上下文学习的神经视频压缩方法,提升复杂场景下的运动补偿精度。 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
10 Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction 提出G-Search与P-Sigmoid以加速多模态大语言模型 large language model multimodal
11 AgriBench: A Hierarchical Agriculture Benchmark for Multimodal Large Language Models 提出AgriBench农业基准测试,评估多模态大语言模型在农业领域的应用能力。 large language model multimodal
12 Planning from Imagination: Episodic Simulation and Episodic Memory for Vision-and-Language Navigation 提出基于情景模拟和情景记忆的混合记忆系统,提升视觉-语言导航任务性能。 VLN
13 PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation PhyT2V:利用LLM引导的迭代自精炼实现符合物理规律的文本生成视频 chain-of-thought
14 ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models 提出ATP-LLaVA以解决大规模视觉语言模型的计算成本问题 large language model
15 Towards Pixel-Level Prediction for Gaze Following: Benchmark and Approach 提出GazeSeg模型,用于像素级注视目标预测,并构建大规模数据集。 foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页