cs.CV(2023-12-15)

📊 共 34 篇论文 | 🔗 16 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (12 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (9 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (9 🔗5) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (12 篇)

#题目一句话要点标签🔗
1 Weakly-Supervised 3D Visual Grounding based on Visual Language Alignment 提出基于视觉语言对齐的弱监督3D视觉定位方法3D-VLA scene understanding VLA visual grounding
2 From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior 提出基于地面接触先验的粗到精自监督单目深度估计方法,提升动态物体深度估计精度。 depth estimation monocular depth
3 LAENeRF: Local Appearance Editing for Neural Radiance Fields LAENeRF:用于神经辐射场的局部外观编辑,实现交互式、快速且内存高效的风格迁移。 NeRF neural radiance field
4 Deep Event Visual Odometry DEVO:一种高性能的单目事件相机视觉里程计系统 visual odometry
5 SlimmeRF: Slimmable Radiance Fields SlimmeRF:提出可裁剪神经辐射场,实现模型大小与精度间的灵活权衡。 NeRF neural radiance field scene reconstruction
6 Multispectral Stereo-Image Fusion for 3D Hyperspectral Scene Reconstruction 提出多光谱立体图像融合方法,用于三维高光谱场景重建 scene reconstruction
7 PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment PLGSLAM:基于局部到全局Bundle Adjustment的渐进式神经场景表示,实现大规模场景高精度SLAM visual SLAM scene reconstruction
8 SLS4D: Sparse Latent Space for 4D Novel View Synthesis SLS4D:利用稀疏潜在空间实现4D场景的新视角合成 NeRF neural radiance field
9 High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior 提出基于主题知识先验的单图高质量3D模型生成方法,解决机器人领域3D数据稀缺问题 NeRF
10 RANRAC: Robust Neural Scene Representations via Random Ray Consensus 提出RANRAC以解决图像不一致性问题 neural radiance field
11 Towards Transferable Targeted 3D Adversarial Attack in the Physical World 提出TT3D框架,实现物理世界中可迁移的指定目标3D对抗攻击。 NeRF
12 Hierarchical Graph Pattern Understanding for Zero-Shot VOS 提出层级图模式理解网络HGPU,用于解决零样本视频目标分割中光流失效问题。 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
13 WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge 提出WAVER框架,通过知识蒸馏解决文本视频检索中写作风格差异问题。 distillation open-vocabulary open vocabulary
14 SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery SkySense:面向地球观测图像通用理解的多模态遥感基础模型 MAE contrastive learning spatiotemporal
15 T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning 提出T-MAE,利用时序掩码自编码器提升LiDAR点云表征学习效果 representation learning masked autoencoder MAE
16 Part Representation Learning with Teacher-Student Decoder for Occluded Person Re-identification 提出基于Teacher-Student解码器的部件表示学习框架,解决遮挡行人重识别问题。 representation learning teacher-student distillation
17 FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline FastSR-NeRF:利用超分辨率流水线提升NeRF在消费级设备上的效率 distillation NeRF neural radiance field
18 Rich Human Feedback for Text-to-Image Generation 提出RichHF-18K数据集,通过富含人类反馈信息提升文本到图像生成质量。 reinforcement learning RLHF large language model
19 Pixel-Superpixel Contrastive Learning and Pseudo-Label Correction for Hyperspectral Image Clustering 提出像素-超像素对比学习与伪标签校正方法,用于高光谱图像聚类。 contrastive learning HSI
20 Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval 提出Whiten-MTD多教师蒸馏框架,用于高效视觉检索,提升检索效率。 distillation
21 CLAF: Contrastive Learning with Augmented Features for Imbalanced Semi-Supervised Learning 提出CLAF,通过增强特征对比学习解决不平衡半监督学习问题 contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
22 Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey 视觉基础模型统一生成与判别能力综述:探索未来发展方向 large language model foundation model
23 PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains PathoDuet:用于H&E和IHC病理切片分析的病理学基础模型 foundation model
24 Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception 提出VehicleMAE,利用结构信息引导车辆中心感知多模态预训练。 multimodal
25 FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring 提出FoMo-Net,用于森林监测的多模态遥感基础模型及基准测试FoMo-Bench。 foundation model
26 Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement 提出Enlighten-Your-Voice多模态零样本低光图像增强框架,提升用户交互体验。 multimodal
27 EDA: Evolving and Distinct Anchors for Multimodal Motion Prediction 提出EDA:演化且独特的锚点,解决多模态运动预测中回归能力和代表性问题。 multimodal
28 Osprey: Pixel Understanding with Visual Instruction Tuning Osprey:通过视觉指令微调实现像素级图像理解 large language model multimodal
29 UniAR: A Unified model for predicting human Attention and Responses on visual content UniAR:统一模型预测视觉内容上的人类注意力和响应 multimodal
30 TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks 提出Text-Align Anomaly Backbone (TAB)模型,用于工业缺陷检测与定位任务。 foundation model

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
31 GSVA: Generalized Segmentation via Multimodal Large Language Models 提出GSVA,通过多模态大语言模型解决广义指代表达分割问题 spatial relationship large language model multimodal
32 nuScenes Knowledge Graph -- A comprehensive semantic representation of traffic scenes for trajectory prediction 提出nuScenes知识图谱(nSKG),用于交通场景轨迹预测的全面语义表示。 spatial relationship

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
33 Ins-HOI: Instance Aware Human-Object Interactions Recovery 提出Ins-HOI框架,通过实例感知的隐式场重建人与物体的交互 penetration human-object interaction HOI

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
34 Collaborating Foundation Models for Domain Generalized Semantic Segmentation 提出CLOUDS框架,利用协同基础模型提升领域泛化语义分割性能 domain randomization foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页