cs.CV（2023-12-14）

📊 共 40 篇论文 | 🔗 13 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (17 🔗6) 支柱二：RL算法与架构 (RL & Architecture) (12 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱一：机器人控制 (Robot Control) (3) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (17 篇)

#	题目	一句话要点	标签	🔗	⭐
1	3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting	提出基于可变形3D高斯溅射的3DGS-Avatar，实现快速可动画化身重建	3D gaussian splatting 3DGS gaussian splatting
2	iComMa: Inverting 3D Gaussian Splatting for Camera Pose Estimation via Comparing and Matching	iComMa：通过比较匹配反演3D高斯溅射实现相机位姿估计	3D gaussian splatting 3DGS gaussian splatting
3	OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers	OMG：通过混合控制器实现开放词汇运动生成	open-vocabulary open vocabulary text-to-motion	✅
4	Dietary Assessment with Multimodal ChatGPT: A Systematic Analysis	利用多模态ChatGPT进行膳食评估，无需微调食物检测精度高达87.5%。	scene understanding foundation model multimodal
5	LEMON: Learning 3D Human-Object Interaction Relation from 2D Images	LEMON：从2D图像学习3D人-物交互关系，提升具身智能	affordance human-object interaction embodied AI
6	CF-NeRF: Camera Parameter Free Neural Radiance Fields with Incremental Learning	提出CF-NeRF，通过增量学习实现无相机参数的神经辐射场重建，适用于复杂旋转场景。	NeRF neural radiance field
7	ColNeRF: Collaboration for Generalizable Sparse Input Neural Radiance Field	ColNeRF：面向稀疏输入的协同神经辐射场，提升泛化性	NeRF neural radiance field	✅
8	Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption	Aleth-NeRF：基于隐蔽场假设的光照自适应NeRF，解决弱光/过曝场景NeRF重建问题	NeRF neural radiance field	✅
9	SpectralNeRF: Physically Based Spectral Rendering with Neural Radiance Field	提出SpectralNeRF，一种基于NeRF的物理光谱渲染方法，提升新视角合成质量。	NeRF neural radiance field	✅
10	OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments	OccNeRF：提出一种无需激光雷达的3D场景占据预测方法	depth estimation open-vocabulary open vocabulary
11	VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding	提出VMT-Adapter，用于多任务密集场景理解的参数高效迁移学习。	scene understanding
12	LatentEditor: Text Driven Local Editing of 3D Scenes	LatentEditor：提出基于文本驱动的3D场景局部编辑框架，提升编辑速度与质量。	NeRF scene reconstruction
13	ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining	ZeroRF：一种无需预训练的快速稀疏视角360°重建方法	NeRF neural radiance field	✅
14	Text2Immersion: Generative Immersive Scene with 3D Gaussians	Text2Immersion：利用3D高斯生成高质量文本驱动的沉浸式场景	depth estimation
15	Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments	MoRE：用于变化3D环境中多物体重定位与重建的方法	scene understanding
16	VaLID: Variable-Length Input Diffusion for Novel View Synthesis	提出VaLID，利用变长输入扩散模型实现高质量新视角合成	neural radiance field
17	CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer	提出CT-MVSNet以解决高分辨率深度估计的计算成本问题	depth estimation	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Motion Flow Matching for Human Motion Synthesis and Editing	提出Motion Flow Matching，加速人体运动合成与编辑，提升采样效率。	flow matching text-to-motion motion synthesis
19	SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector	提出SKDF框架以解决开放世界物体检测中的知识蒸馏问题	distillation open-vocabulary open vocabulary	✅
20	Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers	提出基于Transformer的混合Triplane-Gaussian表示方法，实现快速且泛化的单视图3D重建。	distillation gaussian splatting splatting	✅
21	Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning	提出基于多粒度跨模态对比学习的文本引导人脸识别方法，提升低质量图像识别性能。	contrastive learning multimodal
22	CLIP-guided Federated Learning on Heterogeneous and Long-Tailed Data	提出CLIP2FL方法，利用CLIP模型优化异构长尾联邦学习。	contrastive learning distillation open-vocabulary
23	Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences	Promptable Behaviors：通过人类偏好个性化多目标奖励，实现可定制机器人行为	reinforcement learning embodied AI
24	Stable Score Distillation for High-Quality 3D Generation	提出Stable Score Distillation (SSD)方法，提升高质量3D内容生成效果。	distillation
25	Dataset Distillation via Adversarial Prediction Matching	提出对抗预测匹配的数据集蒸馏方法，高效压缩数据集并保持模型性能。	distillation
26	RankDVQA-mini: Knowledge Distillation-Driven Deep Video Quality Assessment	提出RankDVQA-mini，通过知识蒸馏压缩RankDVQA模型，实现轻量化视频质量评估。	distillation	✅
27	Generative Model-based Feature Knowledge Distillation for Action Recognition	提出基于生成模型的特征知识蒸馏框架，提升视频行为识别中小模型的性能。	distillation	✅
28	Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding	提出ICMVC方法，通过高置信度引导解决不完整多视图聚类问题	representation learning contrastive learning	✅
29	Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation	提出SBV模型，利用听觉信息增强视觉语义分割，解决增强现实设备外视野感知问题。	teacher-student distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
30	General Object Foundation Model for Images and Videos at Scale	提出GLEE：面向图像和视频的通用物体基础模型，实现开放世界场景下的物体感知。	large language model foundation model zero-shot transfer
31	Holodeck: Language Guided Generation of 3D Embodied AI Environments	Holodeck：利用语言引导生成3D具身智能环境，无需人工干预。	embodied AI large language model
32	BDHT: Generative AI Enables Causality Analysis for Mild Cognitive Impairment	提出基于生成对抗网络的脑扩散模型BDHT，用于轻度认知障碍的因果关系分析。	multimodal
33	Exploring Transferability for Randomized Smoothing	提出基于数据分布扩展的预训练方法，提升随机平滑模型的可迁移认证鲁棒性	foundation model
34	Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models	提出DepictQA，利用多模态大语言模型进行类人图像质量评估，突破传统评分限制。	large language model
35	Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking	提出基于局部概念重排序的无训练零样本组合图像检索方法	foundation model
36	CogAgent: A Visual Language Model for GUI Agents	CogAgent：面向GUI代理的视觉语言模型，提升GUI理解与导航能力	large language model	✅

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
37	Interactive Humanoid: Online Full-Body Motion Reaction Synthesis with Social Affordance Canonicalization and Forecasting	提出基于社交可供性的拟人机器人在线全身动作反应合成方法	humanoid affordance reaction synthesis
38	DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving	DriveMLM：对齐行为规划状态的多模态大语言模型用于自动驾驶	motion planning large language model multimodal
39	ProSGNeRF: Progressive Dynamic Neural Scene Graph with Frequency Modulated Auto-Encoder in Urban Scenes	ProSGNeRF：一种用于城市场景中动态神经场景图的渐进式方法，结合频率调制自编码器。	manipulation

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
40	Towards Robust and Expressive Whole-body Human Pose and Shape Estimation	提出新框架以增强全身姿态与形状估计的鲁棒性	SMPL-X	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页