cs.CV(2024-12-26)

📊 共 23 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 MVS-GS: High-Quality 3D Gaussian Splatting Mapping via Online Multi-View Stereo MVS-GS:通过在线多视图立体匹配实现高质量3D高斯溅射建图 depth estimation 3D gaussian splatting 3DGS
2 BeSplat: Gaussian Splatting from a Single Blurry Image and Event Stream BeSplat:从单张模糊图像和事件流重建高斯溅射场景 3D gaussian splatting 3DGS gaussian splatting
3 Reflective Gaussian Splatting 提出Ref-Gaussian,实现实时高质量的反射物体三维重建与新视角合成。 3DGS gaussian splatting splatting
4 PlanLLM: Video Procedure Planning with Refinable Large Language Models 提出PlanLLM以解决视频过程规划中的开放词汇问题 open-vocabulary open vocabulary embodied AI
5 Relation-aware Hierarchical Prompt for Open-vocabulary Scene Graph Generation 提出关系感知层级提示框架,解决开放词汇场景图生成中文本表示的局限性。 open-vocabulary open vocabulary large language model
6 Generating Editable Head Avatars with 3D Gaussian GANs 提出基于3D高斯GAN的可编辑头部Avatar生成方法,提升可控性和真实感。 3D gaussian splatting 3DGS gaussian splatting
7 Learning Monocular Depth from Events via Egomotion Compensation 提出基于运动补偿的单目事件相机深度估计框架,提升深度预测精度。 depth estimation monocular depth
8 Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos 利用人体作为校准模式,从非同步、未标定的视频中重建动态3D场景。 scene reconstruction
9 Revisiting Monocular 3D Object Detection with Depth Thickness Field 提出深度厚度场MonoDTF,提升单目3D目标检测的结构感知能力 depth estimation monocular depth

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
10 Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment 提出任务偏好优化TPO,提升多模态大语言模型在视觉任务上的精细理解能力 large language model multimodal
11 SeaMo: A Season-Aware Multimodal Foundation Model for Remote Sensing SeaMo:提出季节感知的遥感多模态基础模型,提升地球观测任务性能。 foundation model multimodal
12 Referencing Where to Focus: Improving VisualGrounding with Referential Query 提出RefFormer,通过引入参考查询改进DETR视觉定位,提升定位精度。 visual grounding
13 Semantic Residual for Multimodal Unified Discrete Representation 提出语义残差跨模态信息解耦框架,提升多模态统一离散表示性能 multimodal
14 Perceive, Query & Reason: Enhancing Video QA with Question-Guided Temporal Queries 提出T-Former,通过问题引导的时序建模增强视频问答能力 large language model multimodal
15 Instruction-Guided Fusion of Multi-Layer Visual Features in Large Vision-Language Models 提出指令引导的视觉聚合器,用于LVLM中多层视觉特征的动态融合 large language model multimodal
16 Multi-Head Attention Driven Dynamic Visual-Semantic Embedding for Enhanced Image-Text Matching 提出多头注意力驱动的动态视觉语义嵌入模型,提升图文匹配效果 multimodal
17 Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval 提出RTime数据集以解决视频-文本检索中的时间理解问题 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
18 CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting 提出CLIP-GS,通过3D高斯溅射统一视觉-语言表征,提升多模态3D理解能力。 representation learning 3D gaussian splatting 3DGS
19 Federated Hybrid Training and Self-Adversarial Distillation: Towards Robust Edge Networks 提出FedBAT框架,通过混合对抗训练和自对抗蒸馏提升联邦学习在边缘网络的鲁棒性。 distillation
20 Advanced Knowledge Transfer: Refined Feature Distillation for Zero-Shot Quantization in Edge Computing 提出AKT:一种用于边缘计算零样本量化的精细化特征蒸馏方法 distillation
21 MoPD: Mixture-of-Prompts Distillation for Vision-Language Models 提出混合提示蒸馏(MoPD)方法,提升视觉-语言模型在未见类别上的泛化能力 distillation
22 Reconstruction Target Matters in Masked Image Modeling for Cross-Domain Few-Shot Learning 提出DAMIM,通过自适应特征重建解决跨域小样本学习中的域偏移问题。 masked autoencoder MAE

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
23 FACEMUG: A Multimodal Generative and Fusion Framework for Local Facial Editing 提出FACEMUG框架以解决多模态局部人脸编辑问题 manipulation semantic map multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页