cs.CV(2023-12-14)

📊 共 40 篇论文 | 🔗 13 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (17 🔗6) 支柱二:RL算法与架构 (RL & Architecture) (12 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱一:机器人控制 (Robot Control) (3) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (17 篇)

#题目一句话要点标签🔗
1 3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting 提出基于可变形3D高斯溅射的3DGS-Avatar,实现快速可动画化身重建 3D gaussian splatting 3DGS gaussian splatting
2 iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching iComMa:通过比较匹配反演3D高斯溅射实现相机位姿估计 3D gaussian splatting 3DGS gaussian splatting
3 OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers OMG:通过混合控制器实现开放词汇运动生成 open-vocabulary open vocabulary text-to-motion
4 Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis 利用多模态ChatGPT进行膳食评估,无需微调食物检测精度高达87.5%。 scene understanding foundation model multimodal
5 LEMON: Learning 3D Human-Object Interaction Relation from 2D Images LEMON:从2D图像学习3D人-物交互关系,提升具身智能 affordance human-object interaction embodied AI
6 CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning 提出CF-NeRF,通过增量学习实现无相机参数的神经辐射场重建,适用于复杂旋转场景。 NeRF neural radiance field
7 ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field ColNeRF:面向稀疏输入的协同神经辐射场,提升泛化性 NeRF neural radiance field
8 Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption Aleth-NeRF:基于隐蔽场假设的光照自适应NeRF,解决弱光/过曝场景NeRF重建问题 NeRF neural radiance field
9 SpectralNeRF: Physically Based Spectral Rendering with Neural Radiance Field 提出SpectralNeRF,一种基于NeRF的物理光谱渲染方法,提升新视角合成质量。 NeRF neural radiance field
10 OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments OccNeRF:提出一种无需激光雷达的3D场景占据预测方法 depth estimation open-vocabulary open vocabulary
11 VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding 提出VMT-Adapter,用于多任务密集场景理解的参数高效迁移学习。 scene understanding
12 LatentEditor: Text Driven Local Editing of 3D Scenes LatentEditor:提出基于文本驱动的3D场景局部编辑框架,提升编辑速度与质量。 NeRF scene reconstruction
13 ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining ZeroRF:一种无需预训练的快速稀疏视角360°重建方法 NeRF neural radiance field
14 Text2Immersion: Generative Immersive Scene with 3D Gaussians Text2Immersion:利用3D高斯生成高质量文本驱动的沉浸式场景 depth estimation
15 Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments MoRE:用于变化3D环境中多物体重定位与重建的方法 scene understanding
16 VaLID: Variable-Length Input Diffusion for Novel View Synthesis 提出VaLID,利用变长输入扩散模型实现高质量新视角合成 neural radiance field
17 CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer 提出CT-MVSNet以解决高分辨率深度估计的计算成本问题 depth estimation

🔬 支柱二:RL算法与架构 (RL & Architecture) (12 篇)

#题目一句话要点标签🔗
18 Motion Flow Matching for Human Motion Synthesis and Editing 提出Motion Flow Matching,加速人体运动合成与编辑,提升采样效率。 flow matching text-to-motion motion synthesis
19 SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector 提出SKDF框架以解决开放世界物体检测中的知识蒸馏问题 distillation open-vocabulary open vocabulary
20 Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers 提出基于Transformer的混合Triplane-Gaussian表示方法,实现快速且泛化的单视图3D重建。 distillation gaussian splatting splatting
21 Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning 提出基于多粒度跨模态对比学习的文本引导人脸识别方法,提升低质量图像识别性能。 contrastive learning multimodal
22 CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data 提出CLIP2FL方法,利用CLIP模型优化异构长尾联邦学习。 contrastive learning distillation open-vocabulary
23 Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences Promptable Behaviors:通过人类偏好个性化多目标奖励,实现可定制机器人行为 reinforcement learning embodied AI
24 Stable Score Distillation for High-Quality 3D Generation 提出Stable Score Distillation (SSD)方法,提升高质量3D内容生成效果。 distillation
25 Dataset Distillation via Adversarial Prediction Matching 提出对抗预测匹配的数据集蒸馏方法,高效压缩数据集并保持模型性能。 distillation
26 RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment 提出RankDVQA-mini,通过知识蒸馏压缩RankDVQA模型,实现轻量化视频质量评估。 distillation
27 Generative Model-based Feature Knowledge Distillation for Action Recognition 提出基于生成模型的特征知识蒸馏框架,提升视频行为识别中小模型的性能。 distillation
28 Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding 提出ICMVC方法,通过高置信度引导解决不完整多视图聚类问题 representation learning contrastive learning
29 Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation 提出SBV模型,利用听觉信息增强视觉语义分割,解决增强现实设备外视野感知问题。 teacher-student distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
30 General Object Foundation Model for Images and Videos at Scale 提出GLEE:面向图像和视频的通用物体基础模型,实现开放世界场景下的物体感知。 large language model foundation model zero-shot transfer
31 Holodeck: Language Guided Generation of 3D Embodied AI Environments Holodeck:利用语言引导生成3D具身智能环境,无需人工干预。 embodied AI large language model
32 BDHT: Generative AI Enables Causality Analysis for Mild Cognitive Impairment 提出基于生成对抗网络的脑扩散模型BDHT,用于轻度认知障碍的因果关系分析。 multimodal
33 Exploring Transferability for Randomized Smoothing 提出基于数据分布扩展的预训练方法,提升随机平滑模型的可迁移认证鲁棒性 foundation model
34 Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models 提出DepictQA,利用多模态大语言模型进行类人图像质量评估,突破传统评分限制。 large language model
35 Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking 提出基于局部概念重排序的无训练零样本组合图像检索方法 foundation model
36 CogAgent: A Visual Language Model for GUI Agents CogAgent:面向GUI代理的视觉语言模型,提升GUI理解与导航能力 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
37 Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting 提出基于社交可供性的拟人机器人在线全身动作反应合成方法 humanoid affordance reaction synthesis
38 DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving DriveMLM:对齐行为规划状态的多模态大语言模型用于自动驾驶 motion planning large language model multimodal
39 ProSGNeRF: Progressive Dynamic Neural Scene Graph with Frequency Modulated Auto-Encoder in Urban Scenes ProSGNeRF:一种用于城市场景中动态神经场景图的渐进式方法,结合频率调制自编码器。 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
40 Towards Robust and Expressive Whole-body Human Pose and Shape Estimation 提出新框架以增强全身姿态与形状估计的鲁棒性 SMPL-X

⬅️ 返回 cs.CV 首页 · 🏠 返回主页