cs.CV（2023-12-15）

📊 共 34 篇论文 | 🔗 16 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (12 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (9 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (9 🔗5) 支柱七：动作重定向 (Motion Retargeting) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱一：机器人控制 (Robot Control) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Weakly-Supervised 3D Visual Grounding based on Visual Language Alignment	提出基于视觉语言对齐的弱监督3D视觉定位方法3D-VLA	scene understanding VLA visual grounding
2	From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior	提出基于地面接触先验的粗到精自监督单目深度估计方法，提升动态物体深度估计精度。	depth estimation monocular depth
3	LAENeRF: Local Appearance Editing for Neural Radiance Fields	LAENeRF：用于神经辐射场的局部外观编辑，实现交互式、快速且内存高效的风格迁移。	NeRF neural radiance field
4	Deep Event Visual Odometry	DEVO：一种高性能的单目事件相机视觉里程计系统	visual odometry	✅
5	SlimmeRF: Slimmable Radiance Fields	SlimmeRF：提出可裁剪神经辐射场，实现模型大小与精度间的灵活权衡。	NeRF neural radiance field scene reconstruction	✅
6	Multispectral Stereo-Image Fusion for 3D Hyperspectral Scene Reconstruction	提出多光谱立体图像融合方法，用于三维高光谱场景重建	scene reconstruction
7	PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment	PLGSLAM：基于局部到全局Bundle Adjustment的渐进式神经场景表示，实现大规模场景高精度SLAM	visual SLAM scene reconstruction	✅
8	SLS4D: Sparse Latent Space for 4D Novel View Synthesis	SLS4D：利用稀疏潜在空间实现4D场景的新视角合成	NeRF neural radiance field
9	High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior	提出基于主题知识先验的单图高质量3D模型生成方法，解决机器人领域3D数据稀缺问题	NeRF
10	RANRAC: Robust Neural Scene Representations via Random Ray Consensus	提出RANRAC以解决图像不一致性问题	neural radiance field
11	Towards Transferable Targeted 3D Adversarial Attack in the Physical World	提出TT3D框架，实现物理世界中可迁移的指定目标3D对抗攻击。	NeRF
12	Hierarchical Graph Pattern Understanding for Zero-Shot VOS	提出层级图模式理解网络HGPU，用于解决零样本视频目标分割中光流失效问题。	optical flow	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
13	WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge	提出WAVER框架，通过知识蒸馏解决文本视频检索中写作风格差异问题。	distillation open-vocabulary open vocabulary	✅
14	SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery	SkySense：面向地球观测图像通用理解的多模态遥感基础模型	MAE contrastive learning spatiotemporal
15	T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning	提出T-MAE，利用时序掩码自编码器提升LiDAR点云表征学习效果	representation learning masked autoencoder MAE	✅
16	Part Representation Learning with Teacher-Student Decoder for Occluded Person Re-identification	提出基于Teacher-Student解码器的部件表示学习框架，解决遮挡行人重识别问题。	representation learning teacher-student distillation	✅
17	FastSR-NeRF: Improving NeRF Efficiency on Consumer Devices with A Simple Super-Resolution Pipeline	FastSR-NeRF：利用超分辨率流水线提升NeRF在消费级设备上的效率	distillation NeRF neural radiance field
18	Rich Human Feedback for Text-to-Image Generation	提出RichHF-18K数据集，通过富含人类反馈信息提升文本到图像生成质量。	reinforcement learning RLHF large language model	✅
19	Pixel-Superpixel Contrastive Learning and Pseudo-Label Correction for Hyperspectral Image Clustering	提出像素-超像素对比学习与伪标签校正方法，用于高光谱图像聚类。	contrastive learning HSI
20	Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval	提出Whiten-MTD多教师蒸馏框架，用于高效视觉检索，提升检索效率。	distillation	✅
21	CLAF: Contrastive Learning with Augmented Features for Imbalanced Semi-Supervised Learning	提出CLAF，通过增强特征对比学习解决不平衡半监督学习问题	contrastive learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Towards the Unification of Generative and Discriminative Visual Foundation Model: A Survey	视觉基础模型统一生成与判别能力综述：探索未来发展方向	large language model foundation model
23	PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains	PathoDuet：用于H&E和IHC病理切片分析的病理学基础模型	foundation model	✅
24	Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception	提出VehicleMAE，利用结构信息引导车辆中心感知多模态预训练。	multimodal	✅
25	FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring	提出FoMo-Net，用于森林监测的多模态遥感基础模型及基准测试FoMo-Bench。	foundation model
26	Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image Enhancement	提出Enlighten-Your-Voice多模态零样本低光图像增强框架，提升用户交互体验。	multimodal	✅
27	EDA: Evolving and Distinct Anchors for Multimodal Motion Prediction	提出EDA：演化且独特的锚点，解决多模态运动预测中回归能力和代表性问题。	multimodal	✅
28	Osprey: Pixel Understanding with Visual Instruction Tuning	Osprey：通过视觉指令微调实现像素级图像理解	large language model multimodal	✅
29	UniAR: A Unified model for predicting human Attention and Responses on visual content	UniAR：统一模型预测视觉内容上的人类注意力和响应	multimodal
30	TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks	提出Text-Align Anomaly Backbone (TAB)模型，用于工业缺陷检测与定位任务。	foundation model

🔬 支柱七：动作重定向 (Motion Retargeting) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
31	GSVA: Generalized Segmentation via Multimodal Large Language Models	提出GSVA，通过多模态大语言模型解决广义指代表达分割问题	spatial relationship large language model multimodal
32	nuScenes Knowledge Graph -- A comprehensive semantic representation of traffic scenes for trajectory prediction	提出nuScenes知识图谱(nSKG)，用于交通场景轨迹预测的全面语义表示。	spatial relationship	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
33	Ins-HOI: Instance Aware Human-Object Interactions Recovery	提出Ins-HOI框架，通过实例感知的隐式场重建人与物体的交互	penetration human-object interaction HOI

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
34	Collaborating Foundation Models for Domain Generalized Semantic Segmentation	提出CLOUDS框架，利用协同基础模型提升领域泛化语义分割性能	domain randomization foundation model	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页