cs.CV(2024-10-25)
📊 共 19 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗4)
支柱九:具身大模型 (Embodied Foundation Models) (6)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗2)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | DiffGS: Functional Gaussian Splatting Diffusion | 提出DiffGS,一种基于潜在扩散模型的功能高斯溅射生成方法,实现高质量快速渲染。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 2 | ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting | 提出基于高斯溅射的ArCSEM方法,实现扫描电镜图像的艺术化自动着色 | gaussian splatting splatting | ✅ | |
| 3 | Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization | 提出内容感知辐射场,通过对抗内容感知量化实现模型复杂度与场景复杂度的对齐。 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 4 | Evaluation of strategies for efficient rate-distortion NeRF streaming | 研究NeRF流式传输的率失真性能,提出神经网络参数流式传输策略。 | NeRF neural radiance field scene reconstruction | ||
| 5 | MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors | MonoDGP:利用解耦查询和几何误差先验的单目3D目标检测 | depth estimation metric depth | ✅ | |
| 6 | FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation | FastPCI:运动结构引导的快速点云帧插值方法 | scene flow | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery | OReole-FM:面向高分辨率卫星图像的十亿参数级遥感基础模型探索 | foundation model | ||
| 8 | A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT | 提出基于BiomedCLIP-PubMedBERT的多模态方法,用于内窥镜VCE图像分类。 | multimodal | ||
| 9 | TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning | TimeSuite:通过Grounded Tuning提升MLLM在长视频理解中的能力 | large language model multimodal TAMP | ||
| 10 | Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models | Frozen-DETR:利用冻结的预训练模型增强DETR目标检测性能 | foundation model | ||
| 11 | Turn-by-Turn Indoor Navigation for the Visually Impaired | 提出一种基于智能手机和树莓派的盲人室内Turn-by-Turn导航系统 | large language model multimodal | ||
| 12 | MaCTG: Multi-Agent Collaborative Thought Graph for Automatic Programming | 提出MaCTG,通过多智能体协作图解决自动编程中任务规划低效和幻觉问题 | large language model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Exploring Self-Supervised Learning with U-Net Masked Autoencoders and EfficientNet B7 for Improved Classification | 提出基于U-Net掩码自编码器和EfficientNet B7的自监督学习方法,提升图像分类精度。 | masked autoencoder | ||
| 14 | Topology-aware Mamba for Crack Segmentation in Structures | 提出CrackMamba以解决基础设施裂缝分割问题 | Mamba | ✅ | |
| 15 | Diverse Sign Language Translation | 提出DivSLT任务,解决手语翻译中一对多映射问题,提升翻译多样性和准确性 | reinforcement learning large language model | ||
| 16 | Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation | 提出Fusion-then-Distillation方法,用于领域自适应3D语义分割中的跨模态正向蒸馏。 | distillation | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | x-RAGE: eXtended Reality -- Action & Gesture Events Dataset | x-RAGE:用于扩展现实中动作与手势事件的首个事件相机数据集 | egocentric first-person view | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality | FasterCache:一种高质量、免训练的视频扩散模型加速策略 | classifier-free guidance |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Unsupervised Machine Learning for Detecting and Locating Human-Made Objects in 3D Point Cloud | 提出基于非监督学习的三维点云人工地物检测与定位方法 | PULSE |