cs.CV(2024-05-07)

📊 共 19 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (3) 支柱一:机器人控制 (Robot Control) (2 🔗2) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 Novel View Synthesis with Neural Radiance Fields for Industrial Robot Applications 提出基于机器人运动学的NeRF新视角合成方法,用于工业机器人应用 NeRF neural radiance field scene reconstruction
2 DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid 提出DistGrid,基于分布式多分辨率哈希网格实现大规模场景重建。 NeRF neural radiance field scene reconstruction
3 Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing Edit-Your-Motion:时空解耦扩散学习用于视频运动编辑,解决泛化性差问题。 implicit representation human motion
4 Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar 提出Radar Fields,用于FMCW雷达的频域神经场景表示,实现恶劣天气下的场景重建。 scene reconstruction
5 Tactile-Augmented Radiance Fields 提出触觉增强辐射场(TaRF),融合视觉与触觉信息,用于场景三维重建与感知。 neural radiance field
6 Light Field Compression Based on Implicit Neural Representation 提出基于隐式神经表示的光场压缩方案,有效降低视图间冗余。 implicit representation

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
7 DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving DriveWorld:通过世界模型进行自动驾驶的4D预训练场景理解 world model latent dynamics representation learning
8 VMambaCC: A Visual State Space Model for Crowd Counting 提出VMambaCC模型,利用视觉状态空间模型解决人群计数问题 Mamba state space model
9 Vision Mamba: A Comprehensive Survey and Taxonomy 对视觉领域Mamba模型进行全面综述与分类,旨在促进其在视觉任务中的应用。 Mamba SSM state space model
10 ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation 提出ELiTe,通过高效图像-激光雷达知识迁移提升语义分割性能 representation learning distillation foundation model
11 Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing 提出DARLING框架,解耦场景文本图像的风格与内容特征,提升识别、移除和编辑性能 representation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
12 Leveraging Medical Foundation Model Features in Graph Neural Network-Based Retrieval of Breast Histopathology Images 利用医学预训练模型特征,提出基于图神经网络的乳腺组织病理图像检索方法 foundation model
13 Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation Sign2GPT:利用大型语言模型实现无词汇的口语翻译 large language model
14 Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks 视觉指令调优使LLM更易受攻击,损害了其安全性。 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
15 Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos 提出Diff-IP2D,利用扩散模型预测第一视角视频中的手-物交互,解决单向预测误差累积问题。 manipulation affordance egocentric
16 SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing SEED-Data-Edit:一个用于指令式图像编辑的混合数据集,提升图像操作的灵活性。 manipulation large language model multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
17 ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers ViewFormer:利用视角引导Transformer探索多视角3D Occupancy感知的时空建模 spatiotemporal

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
18 ChatHuman: Chatting about 3D Humans with Tools 提出ChatHuman以解决3D人类任务分析的复杂性问题 human-object interaction large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
19 Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling 提出时序平滑的Procrustean对齐和空间变异形变建模,解决非刚性SfM问题。 motion recovery

⬅️ 返回 cs.CV 首页 · 🏠 返回主页