cs.CV(2024-09-03)

📊 共 26 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (10 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (4) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
1 PixelBytes: Catching Unified Embedding for Multimodal Generation 提出PixelBytes嵌入,用于统一多模态表示学习和序列生成。 Mamba SSM state space model
2 PMT-MAE: Dual-Branch Self-Supervised Learning with Distillation for Efficient Point Cloud Classification PMT-MAE:双分支自监督学习与蒸馏,高效点云分类。 masked autoencoder MAE distillation
3 Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion 提出Shuffle Mamba以解决多模态图像融合中的偏差问题 Mamba state space model
4 LinFusion: 1 GPU, 1 Minute, 16K Image LinFusion:利用线性注意力机制,单GPU一分钟生成16K图像 Mamba linear attention spatial relationship
5 Dual Advancement of Representation Learning and Clustering for Sparse and Noisy Images DARLC:针对稀疏噪声图像,同步提升表征学习与聚类性能 representation learning contrastive learning
6 Efficient Point Cloud Classification via Offline Distillation Framework and Negative-Weight Self-Distillation Technique 提出离线蒸馏框架与负权重自蒸馏,提升点云分类效率并降低模型复杂度。 distillation
7 Latent Distillation for Continual Object Detection at the Edge 提出面向边缘设备持续目标检测的潜空间蒸馏方法 distillation
8 AstroMAE: Redshift Prediction Using a Masked Autoencoder with a Novel Fine-Tuning Architecture AstroMAE:提出一种基于掩码自编码器和新型微调架构的红移预测方法 masked autoencoder
9 Adaptive Explicit Knowledge Transfer for Knowledge Distillation 提出自适应显式知识迁移(AEKT)方法,提升Logit蒸馏性能。 distillation
10 Improving Apple Object Detection with Occlusion-Enhanced Distillation 提出遮挡增强蒸馏方法,提升苹果目标检测在自然遮挡下的鲁棒性 distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
11 PRoGS: Progressive Rendering of Gaussian Splats PRoGS:提出高斯溅射的渐进式渲染方法,加速3D场景加载与显示。 3D gaussian splatting 3DGS gaussian splatting
12 $S^2$NeRF: Privacy-preserving Training Framework for NeRF 提出$S^2$NeRF,解决NeRF训练中数据隐私泄露问题,实现安全NeRF训练。 NeRF neural radiance field
13 DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction DynOMo:提出动态在线单目高斯重建方法,实现无位姿单目相机下的在线点追踪。 3D gaussian splatting gaussian splatting splatting
14 Segmenting Object Affordances: Reproducibility and Sensitivity to Scale 可复现的物体可供性分割基准,揭示模型对尺度的敏感性 affordance
15 DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos DepthCrafter:生成与开放世界视频内容一致的长深度序列 depth estimation optical flow
16 EPRecon: An Efficient Framework for Real-Time Panoptic 3D Reconstruction from Monocular Video EPRecon:单目视频实时全景3D重建高效框架 scene reconstruction scene understanding
17 Unveiling Deep Shadows: A Survey and Benchmark on Image and Video Shadow Detection, Removal, and Generation in the Deep Learning Era 深度学习时代阴影检测、去除与生成:综述与基准评测 scene understanding

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
18 The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology 综述性研究:大型生成AI在医学影像领域临床价值及未来发展方向 large language model foundation model
19 Convolutional Networks as Extremely Small Foundation Models: Visual Prompting and Theoretical Perspective 提出基于卷积网络的极小基础模型,通过视觉提示实现视频目标分割的快速迁移。 foundation model
20 From Data to Insights: A Covariate Analysis of the IARPA BRIAR Dataset for Multimodal Biometric Recognition Algorithms at Altitude and Range 针对IARPA BRIAR数据集,提出基于协变量分析的多模态生物识别算法性能评估方法。 multimodal
21 MetaFood3D: 3D Food Dataset with Nutrition Values MetaFood3D:一个包含营养价值的3D食物数据集,促进食物计算研究。 multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
22 EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision EgoPressure:提出用于手部压力和姿态估计的自中心视觉数据集 egocentric egocentric vision
23 Geometry-Aware Feature Matching for Large-Scale Structure from Motion 提出几何感知特征匹配方法,提升大规模SfM在视角变化下的匹配精度和稠密度。 feature matching

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
24 Dynamic Motion Synthesis: Masked Audio-Text Conditioned Spatio-Temporal Transformers 提出基于掩码音视频文本条件时空Transformer的动态运动合成框架 motion synthesis motion generation VQ-VAE

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
25 Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models 提出ComBo基准,用于深入评估大型多模态模型(LMMs)的分类能力。 spatial relationship multimodal

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
26 A New People-Object Interaction Dataset and NVS Benchmarks 提出一个多人/单人交互新数据集,并建立基于该数据集的新视角合成(NVS)基准。 human-object interaction SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页