cs.CV(2024-08-25)

📊 共 18 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers TranSplat:利用Transformer从稀疏多视角图像中实现可泛化的3D高斯溅射 depth estimation monocular depth 3D gaussian splatting
2 Making Large Language Models Better Planners with Reasoning-Decision Alignment 提出RDA-Driver,通过推理-决策对齐提升大语言模型在自动驾驶规划中的性能。 scene understanding large language model multimodal
3 OpenNav: Efficient Open Vocabulary 3D Object Detection for Smart Wheelchair Navigation OpenNav:面向智能轮椅导航的高效开放词汇3D目标检测 open-vocabulary open vocabulary
4 Splatt3R: Zero-shot Gaussian Splatting from Uncalibrated Image Pairs Splatt3R:基于未标定图像对的零样本高斯溅射方法 gaussian splatting splatting
5 InSpaceType: Dataset and Benchmark for Reconsidering Cross-Space Type Performance in Indoor Monocular Depth 提出InSpaceType数据集与评测基准,用于评估室内单目深度估计在不同空间类型上的泛化性能。 depth estimation monocular depth
6 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing 3D-VirtFusion:利用生成扩散模型和可控编辑进行合成3D数据增强 scene understanding foundation model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
7 ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models ConVis:通过幻觉可视化对比解码缓解多模态大语言模型中的幻觉问题 large language model multimodal
8 LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task LowCLIP:针对低资源语言的多模态图像检索CLIP模型架构适配 multimodal
9 Tangram: Benchmark for Evaluating Geometric Element Recognition in Large Multimodal Models 提出Tangram基准,评估大型多模态模型在几何元素识别方面的能力。 multimodal
10 Multi-SIGATnet: A multimodal schizophrenia MRI classification algorithm using sparse interaction mechanisms and graph attention networks 提出基于稀疏交互机制和图注意力网络的多模态精神分裂症MRI分类算法 multimodal
11 Multimodal Ensemble with Conditional Feature Fusion for Dysgraphia Diagnosis in Children from Handwriting Samples 提出基于条件特征融合的多模态集成方法,提升儿童书写障碍诊断准确率。 multimodal
12 Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching CxD:通过组合、绘制和修饰,利用扩散模型生成复杂场景图像 large language model chain-of-thought

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
13 SceneDreamer360: Text-Driven 3D-Consistent Scene Generation with Panoramic Gaussian Splatting SceneDreamer360:提出基于全景高斯溅射的文本驱动3D一致场景生成方法 dreamer 3D gaussian splatting 3DGS
14 MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation MSVM-UNet:用于医学图像分割的多尺度Vision Mamba UNet模型 Mamba SSM state space model
15 Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild 提出CCFG-Net,解决自然场景下字形相似字符的极细粒度视觉分类难题。 contrastive learning scene understanding

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
16 PhysPart: Physically Plausible Part Completion for Interactable Objects PhysPart:提出基于扩散模型的物理可信交互对象部件补全方法 manipulation classifier-free guidance physically plausible
17 Localization of Synthetic Manipulations in Western Blot Images 提出一种基于图像块的合成检测器,用于定位Western blot图像中的伪造区域 manipulation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
18 InterTrack: Tracking Human Object Interaction without Object Templates 提出InterTrack,无需物体模板即可跟踪人与物体的交互 human-object interaction SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页