cs.CV（2024-05-07）

📊 共 19 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (3) 支柱一：机器人控制 (Robot Control) (2 🔗2) 支柱八：物理动画 (Physics-based Animation) (1 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Novel View Synthesis with Neural Radiance Fields for Industrial Robot Applications	提出基于机器人运动学的NeRF新视角合成方法，用于工业机器人应用	NeRF neural radiance field scene reconstruction
2	DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid	提出DistGrid，基于分布式多分辨率哈希网格实现大规模场景重建。	NeRF neural radiance field scene reconstruction
3	Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing	Edit-Your-Motion：时空解耦扩散学习用于视频运动编辑，解决泛化性差问题。	implicit representation human motion
4	Radar Fields: Frequency-Space Neural Scene Representations for FMCW Radar	提出Radar Fields，用于FMCW雷达的频域神经场景表示，实现恶劣天气下的场景重建。	scene reconstruction
5	Tactile-Augmented Radiance Fields	提出触觉增强辐射场(TaRF)，融合视觉与触觉信息，用于场景三维重建与感知。	neural radiance field	✅
6	Light Field Compression Based on Implicit Neural Representation	提出基于隐式神经表示的光场压缩方案，有效降低视图间冗余。	implicit representation

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
7	DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	DriveWorld：通过世界模型进行自动驾驶的4D预训练场景理解	world model latent dynamics representation learning
8	VMambaCC: A Visual State Space Model for Crowd Counting	提出VMambaCC模型，利用视觉状态空间模型解决人群计数问题	Mamba state space model
9	Vision Mamba: A Comprehensive Survey and Taxonomy	对视觉领域Mamba模型进行全面综述与分类，旨在促进其在视觉任务中的应用。	Mamba SSM state space model	✅
10	ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation	提出ELiTe，通过高效图像-激光雷达知识迁移提升语义分割性能	representation learning distillation foundation model
11	Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing	提出DARLING框架，解耦场景文本图像的风格与内容特征，提升识别、移除和编辑性能	representation learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Leveraging Medical Foundation Model Features in Graph Neural Network-Based Retrieval of Breast Histopathology Images	利用医学预训练模型特征，提出基于图神经网络的乳腺组织病理图像检索方法	foundation model
13	Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation	Sign2GPT：利用大型语言模型实现无词汇的口语翻译	large language model
14	Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks	视觉指令调优使LLM更易受攻击，损害了其安全性。	large language model

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos	提出Diff-IP2D，利用扩散模型预测第一视角视频中的手-物交互，解决单向预测误差累积问题。	manipulation affordance egocentric	✅
16	SEED-Data-Edit Technical Report: A Hybrid Dataset for Instructional Image Editing	SEED-Data-Edit：一个用于指令式图像编辑的混合数据集，提升图像操作的灵活性。	manipulation large language model multimodal	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers	ViewFormer：利用视角引导Transformer探索多视角3D Occupancy感知的时空建模	spatiotemporal	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	ChatHuman: Chatting about 3D Humans with Tools	提出ChatHuman以解决3D人类任务分析的复杂性问题	human-object interaction large language model

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Non-rigid Structure-from-Motion: Temporally-smooth Procrustean Alignment and Spatially-variant Deformation Modeling	提出时序平滑的Procrustean对齐和空间变异形变建模，解决非刚性SfM问题。	motion recovery

⬅️ 返回 cs.CV 首页 · 🏠 返回主页