cs.CV(2024-10-10)

📊 共 27 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (9 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting MotionGS:提出显式运动引导的可变形3D高斯溅射方法,用于动态场景重建 3D gaussian splatting gaussian splatting splatting
2 Poison-splat: Computation Cost Attack on 3D Gaussian Splatting 提出Poison-splat攻击,揭示3D高斯溅射训练过程中的计算成本安全漏洞 3D gaussian splatting 3DGS gaussian splatting
3 Fast Feedforward 3D Gaussian Splatting Compression 提出FCGS,一种快速前馈的3D高斯溅射压缩方法,无需逐场景优化。 3D gaussian splatting 3DGS gaussian splatting
4 IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera IncEventGS:单事件相机下的无位姿高斯溅射重建 visual odometry 3D gaussian splatting gaussian splatting
5 Neural Material Adaptor for Visual Grounding of Intrinsic Dynamics 提出神经材质适配器NeuMA,用于本征动力学的视觉基准 3D gaussian splatting gaussian splatting splatting
6 DifFRelight: Diffusion-Based Facial Performance Relighting 提出基于扩散模型的面部表演重打光框架,实现自由视点下的高保真光照控制 3D gaussian splatting gaussian splatting splatting
7 A transition towards virtual representations of visual scenes 提出一种面向3D虚拟合成的视觉场景理解架构,实现统一灵活的场景描述。 scene understanding
8 Generalizable and Animatable Gaussian Head Avatar 提出GAGAvatar,通过单张图像生成可泛化和可动画的高斯头部头像。 neural radiance field
9 Test-Time Intensity Consistency Adaptation for Shadow Detection 提出TICA框架以解决阴影检测中的一致性问题 scene understanding

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
10 Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision 提出DIFFLMM,无需额外监督即可在大型多模态模型中涌现视觉定位能力 multimodal visual grounding
11 Music Genre Classification using Large Language Models 利用预训练大语言模型进行音乐流派分类研究 large language model
12 MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models MRAG-Bench:提出视觉中心的多模态检索增强生成评测基准。 multimodal
13 Exploring Foundation Models in Remote Sensing Image Change Detection: A Comprehensive Survey 综述:遥感图像变化检测中Foundation Model的应用探索 foundation model
14 AgroGPT: Efficient Agricultural Vision-Language Model with Expert Tuning 提出AgroGPT以解决农业领域对话模型的知识缺乏问题 large language model multimodal
15 SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation SG-Nav:基于LLM和在线3D场景图提示的零样本物体导航 chain-of-thought
16 LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts LatteCLIP:通过LMM合成文本进行无监督CLIP微调,提升领域泛化能力。 multimodal
17 In Search of Forgotten Domain Generalization 构建大规模风格严格OOD数据集,揭示Web数据训练的OOD泛化假象 foundation model
18 TurboRAG: Accelerating Retrieval-Augmented Generation with Precomputed KV Caches for Chunked Text TurboRAG:通过预计算KV缓存加速分块文本的检索增强生成 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
19 Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Mono-InternVL:通过内生视觉预训练提升单体多模态大语言模型性能 visual pre-training large language model multimodal
20 SPA: 3D Spatial-Awareness Enables Effective Embodied Representation SPA:通过3D空间感知增强具身智能的有效表征学习 representation learning embodied AI language conditioned
21 LaB-CL: Localized and Balanced Contrastive Learning for improving parking slot detection 提出LaB-CL框架,通过局部平衡对比学习提升泊车位检测性能 contrastive learning
22 MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction MGMapNet:用于端到端矢量化高清地图构建的多粒度表示学习 representation learning

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
23 MMHead: Towards Fine-grained Multi-modal 3D Facial Animation MMHead:构建多模态3D面部动画数据集,并提出文本驱动的动画生成方法。 motion generation VQ-VAE
24 Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos 提出基于神经卡尔曼滤波的物理人体运动捕捉方法,提升运动平滑性和物理真实性。 physically plausible

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
25 Understanding Spatio-Temporal Relations in Human-Object Interaction using Pyramid Graph Convolutional Network 提出金字塔图卷积网络PGCN,用于理解人机交互中的时空关系,实现动作识别与分割。 bi-manual human-object interaction

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
26 SeMv-3D: Towards Concurrency of Semantic and Multi-view Consistency in General Text-to-3D Generation SeMv-3D:面向通用文本到3D生成,实现语义与多视角一致性的协同优化 geometric consistency

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
27 ToMiE: Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars ToMiE:提出显式外骨骼方法,用于重建复杂3D人体Avatar SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页