cs.CV(2023-12-07)

📊 共 31 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (11) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗3) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (2) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)

#题目一句话要点标签🔗
1 Open-Vocabulary Segmentation with Semantic-Assisted Calibration 提出语义辅助校准网络SCAN,解决开放词汇分割中词汇内偏差和领域偏差问题。 open-vocabulary open vocabulary
2 GSGFormer: Generative Social Graph Transformer for Multimodal Pedestrian Trajectory Prediction GSGFormer:用于多模态行人轨迹预测的生成式社交图Transformer semantic map multimodal
3 Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation 提出FUMET框架,仅用驾驶视频无监督训练单目深度网络,实现绝对尺度和度量深度估计。 depth estimation monocular depth metric depth
4 EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS EAGLES:轻量级编码加速高效3D高斯模型,显著降低内存占用。 3D gaussian splatting gaussian splatting splatting
5 Text as Image: Learning Transferable Adapter for Multi-Label Classification 提出Text as Image方法,学习可迁移适配器用于多标签图像分类 open-vocabulary open vocabulary large language model
6 Auto-Vocabulary Semantic Segmentation 提出AutoSeg框架,实现无需预定义类别的自动词汇语义分割 open-vocabulary open vocabulary large language model
7 MonoGaussianAvatar: Monocular Gaussian Point-based Head Avatar 提出MonoGaussianAvatar,利用单目视频重建并驱动逼真头部Avatar。 gaussian splatting splatting implicit representation
8 VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment 提出VOODOO 3D,用于单样本3D头部重演的体绘制解耦框架 neural radiance field
9 MuRF: Multi-Baseline Radiance Fields MuRF:提出多基线辐射场方法,解决稀疏视角合成问题,适用于不同基线设置。 NeRF
10 GenDeF: Learning Generative Deformation Field for Video Generation GenDeF:通过学习生成形变场实现高质量视频生成 optical flow
11 Correspondences of the Third Kind: Camera Pose Estimation from Object Reflection 提出基于物体反射的相机位姿估计方法,无需依赖背景信息。 NeRF

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
12 Improved Visual Grounding through Self-Consistent Explanations 提出SelfEQ自洽解释方法,提升视觉定位模型的性能 large language model visual grounding
13 VRPTEST: Evaluating Visual Referring Prompting in Large Multimodal Models VRPTEST:评估大型多模态模型中视觉指代提示的基准数据集与自动化评估框架 foundation model multimodal
14 Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models 提出基于Adapter Tuning和知识增强的医学报告生成方法,提升视觉-语言基础模型在医学领域的性能。 large language model foundation model
15 Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping 提出一种基于跨模态特征映射的轻量级工业异常检测框架 multimodal
16 Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation 提出Rein方法,利用视觉基础模型实现领域泛化语义分割,仅需少量参数即可超越全参数微调。 foundation model
17 Fine-tuning vision foundation model for crack segmentation in civil infrastructures 微调视觉基础模型CrackSAM,用于土木基础设施裂缝分割 foundation model
18 Large Language Models are Good Prompt Learners for Low-Shot Image Classification 提出LLaMP,利用大语言模型增强CLIP,提升小样本图像分类性能 large language model
19 Generating Illustrated Instructions 提出StackedDiffusion模型,生成个性化图文并茂的指令,优于现有方法。 large language model multimodal
20 NewMove: Customizing text-to-video models with novel motions NewMove:通过定制运动扩展文本到视频生成模型的能力 multimodal
21 GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives GPT4SGG:利用整体和区域叙述合成场景图,提升SGG模型性能。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
22 Augmentation-Free Dense Contrastive Knowledge Distillation for Efficient Semantic Segmentation 提出无数据增强的密集对比知识蒸馏方法,提升语义分割效率与精度。 contrastive learning teacher-student distillation
23 HyperDreamer: Hyper-Realistic 3D Content Generation and Editing from a Single Image HyperDreamer:基于单张图像生成和编辑超逼真3D内容 dreamer
24 Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors 提出BiDiff双向扩散模型,融合2D和3D先验知识,提升文本到3D生成质量。 distillation foundation model
25 PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation 提出PartDistill,通过视觉-语言模型蒸馏实现3D形状部件分割 distillation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
26 PhysHOI: Physics-Based Imitation of Dynamic Human-Object Interaction 提出PhysHOI,通过模仿学习实现基于物理的动态人-物交互,无需任务特定奖励。 humanoid reward design human-object interaction
27 Inversion-Free Image Editing with Natural Language 提出InfEdit,实现无需反演的自然语言图像编辑,兼顾一致性与效率 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
28 DiffusionPhase: Motion Diffusion in Frequency Domain DiffusionPhase:提出一种频域运动扩散方法,用于生成高质量、多样化的人体运动序列。 motion diffusion text-to-motion motion generation
29 Digital Life Project: Autonomous 3D Characters with Social Intelligence 提出Digital Life Project,构建具备社交智能的自主3D角色 text-driven motion motion synthesis motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
30 LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos LifelongMemory:利用大型语言模型进行长时程第一视角视频问答 egocentric Ego4D large language model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
31 Instance Tracking in 3D Scenes from Egocentric Videos 提出IT3DEgo基准数据集与实例跟踪方法,解决以自我为中心的3D场景实例跟踪问题。 human-object interaction egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页