cs.CV（2023-12-07）

📊 共 31 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (11) 支柱九：具身大模型 (Embodied Foundation Models) (10 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗3) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (2) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Open-Vocabulary Segmentation with Semantic-Assisted Calibration	提出语义辅助校准网络SCAN，解决开放词汇分割中词汇内偏差和领域偏差问题。	open-vocabulary open vocabulary
2	GSGFormer: Generative Social Graph Transformer for Multimodal Pedestrian Trajectory Prediction	GSGFormer：用于多模态行人轨迹预测的生成式社交图Transformer	semantic map multimodal
3	Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation	提出FUMET框架，仅用驾驶视频无监督训练单目深度网络，实现绝对尺度和度量深度估计。	depth estimation monocular depth metric depth
4	EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS	EAGLES：轻量级编码加速高效3D高斯模型，显著降低内存占用。	3D gaussian splatting gaussian splatting splatting
5	Text as Image: Learning Transferable Adapter for Multi-Label Classification	提出Text as Image方法，学习可迁移适配器用于多标签图像分类	open-vocabulary open vocabulary large language model
6	Auto-Vocabulary Semantic Segmentation	提出AutoSeg框架，实现无需预定义类别的自动词汇语义分割	open-vocabulary open vocabulary large language model
7	MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar	提出MonoGaussianAvatar，利用单目视频重建并驱动逼真头部Avatar。	gaussian splatting splatting implicit representation
8	VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment	提出VOODOO 3D，用于单样本3D头部重演的体绘制解耦框架	neural radiance field
9	MuRF: Multi-Baseline Radiance Fields	MuRF：提出多基线辐射场方法，解决稀疏视角合成问题，适用于不同基线设置。	NeRF
10	GenDeF: Learning Generative Deformation Field for Video Generation	GenDeF：通过学习生成形变场实现高质量视频生成	optical flow
11	Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection	提出基于物体反射的相机位姿估计方法，无需依赖背景信息。	NeRF

🔬 支柱九：具身大模型 (Embodied Foundation Models) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Improved Visual Grounding through Self-Consistent Explanations	提出SelfEQ自洽解释方法，提升视觉定位模型的性能	large language model visual grounding
13	VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models	VRPTEST：评估大型多模态模型中视觉指代提示的基准数据集与自动化评估框架	foundation model multimodal
14	Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models	提出基于Adapter Tuning和知识增强的医学报告生成方法，提升视觉-语言基础模型在医学领域的性能。	large language model foundation model
15	Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping	提出一种基于跨模态特征映射的轻量级工业异常检测框架	multimodal
16	Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation	提出Rein方法，利用视觉基础模型实现领域泛化语义分割，仅需少量参数即可超越全参数微调。	foundation model	✅
17	Fine-tuning vision foundation model for crack segmentation in civil infrastructures	微调视觉基础模型CrackSAM，用于土木基础设施裂缝分割	foundation model
18	Large Language Models are Good Prompt Learners for Low-Shot Image Classification	提出LLaMP，利用大语言模型增强CLIP，提升小样本图像分类性能	large language model	✅
19	Generating Illustrated Instructions	提出StackedDiffusion模型，生成个性化图文并茂的指令，优于现有方法。	large language model multimodal
20	NewMove: Customizing text-to-video models with novel motions	NewMove：通过定制运动扩展文本到视频生成模型的能力	multimodal
21	GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives	GPT4SGG：利用整体和区域叙述合成场景图，提升SGG模型性能。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation	提出无数据增强的密集对比知识蒸馏方法，提升语义分割效率与精度。	contrastive learning teacher-student distillation	✅
23	HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image	HyperDreamer：基于单张图像生成和编辑超逼真3D内容	dreamer
24	Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors	提出BiDiff双向扩散模型，融合2D和3D先验知识，提升文本到3D生成质量。	distillation foundation model	✅
25	PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation	提出PartDistill，通过视觉-语言模型蒸馏实现3D形状部件分割	distillation	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
26	PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction	提出PhysHOI，通过模仿学习实现基于物理的动态人-物交互，无需任务特定奖励。	humanoid reward design human-object interaction
27	Inversion-Free Image Editing with Natural Language	提出InfEdit，实现无需反演的自然语言图像编辑，兼顾一致性与效率	manipulation	✅

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
28	DiffusionPhase: Motion Diffusion in Frequency Domain	DiffusionPhase：提出一种频域运动扩散方法，用于生成高质量、多样化的人体运动序列。	motion diffusion text-to-motion motion generation
29	Digital Life Project: Autonomous 3D Characters with Social Intelligence	提出Digital Life Project，构建具备社交智能的自主3D角色	text-driven motion motion synthesis motion generation

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
30	LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos	LifelongMemory：利用大型语言模型进行长时程第一视角视频问答	egocentric Ego4D large language model	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
31	Instance Tracking in 3D Scenes from Egocentric Videos	提出IT3DEgo基准数据集与实例跟踪方法，解决以自我为中心的3D场景实例跟踪问题。	human-object interaction egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页