cs.CV(2024-10-25)

📊 共 19 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 DiffGS: Functional Gaussian Splatting Diffusion 提出DiffGS,一种基于潜在扩散模型的功能高斯溅射生成方法,实现高质量快速渲染。 3D gaussian splatting 3DGS gaussian splatting
2 ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting 提出基于高斯溅射的ArCSEM方法,实现扫描电镜图像的艺术化自动着色 gaussian splatting splatting
3 Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization 提出内容感知辐射场,通过对抗内容感知量化实现模型复杂度与场景复杂度的对齐。 3D gaussian splatting gaussian splatting splatting
4 Evaluation of strategies for efficient rate-distortion NeRF streaming 研究NeRF流式传输的率失真性能,提出神经网络参数流式传输策略。 NeRF neural radiance field scene reconstruction
5 MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors MonoDGP:利用解耦查询和几何误差先验的单目3D目标检测 depth estimation metric depth
6 FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation FastPCI:运动结构引导的快速点云帧插值方法 scene flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
7 OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery OReole-FM:面向高分辨率卫星图像的十亿参数级遥感基础模型探索 foundation model
8 A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT 提出基于BiomedCLIP-PubMedBERT的多模态方法,用于内窥镜VCE图像分类。 multimodal
9 TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning TimeSuite:通过Grounded Tuning提升MLLM在长视频理解中的能力 large language model multimodal TAMP
10 Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models Frozen-DETR:利用冻结的预训练模型增强DETR目标检测性能 foundation model
11 Turn-by-Turn Indoor Navigation for the Visually Impaired 提出一种基于智能手机和树莓派的盲人室内Turn-by-Turn导航系统 large language model multimodal
12 MaCTG: Multi-Agent Collaborative Thought Graph for Automatic Programming 提出MaCTG,通过多智能体协作图解决自动编程中任务规划低效和幻觉问题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
13 Exploring Self-Supervised Learning with U-Net Masked Autoencoders and EfficientNet B7 for Improved Classification 提出基于U-Net掩码自编码器和EfficientNet B7的自监督学习方法,提升图像分类精度。 masked autoencoder
14 Topology-aware Mamba for Crack Segmentation in Structures 提出CrackMamba以解决基础设施裂缝分割问题 Mamba
15 Diverse Sign Language Translation 提出DivSLT任务,解决手语翻译中一对多映射问题,提升翻译多样性和准确性 reinforcement learning large language model
16 Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation 提出Fusion-then-Distillation方法,用于领域自适应3D语义分割中的跨模态正向蒸馏。 distillation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
17 x-RAGE: eXtended Reality -- Action & Gesture Events Dataset x-RAGE:用于扩展现实中动作与手势事件的首个事件相机数据集 egocentric first-person view

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
18 FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality FasterCache:一种高质量、免训练的视频扩散模型加速策略 classifier-free guidance

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
19 Unsupervised Machine Learning for Detecting and Locating Human-Made Objects in 3D Point Cloud 提出基于非监督学习的三维点云人工地物检测与定位方法 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页