cs.CV(2024-04-05)

📊 共 11 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (2) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
1 Robust Gaussian Splatting 提出鲁棒高斯点云方法以解决3D重建中的模糊与色彩不一致问题 3D gaussian splatting 3DGS gaussian splatting
2 Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models 提出Concept Weaver以解决多概念融合问题 concept fusion
3 SpatialTracker: Tracking Any 2D Pixels in 3D Space 提出SpatialTracker以解决视频中2D像素在3D空间跟踪问题 monocular depth
4 Deep Phase Coded Image Prior 提出深相位编码图像先验以解决深度估计和全聚焦成像问题 depth estimation

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
5 Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation 提出Sigma网络以解决多模态语义分割问题 Mamba SSM state space model
6 Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism 评估对抗鲁棒性:比较FGSM与CW攻击及蒸馏防御机制 distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
7 Idea23D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs 提出Idea23D以解决多模态输入下3D模型生成问题 multimodal
8 Physical Property Understanding from Language-Embedded Feature Fields 提出一种新方法通过语言嵌入特征场理解物理属性 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
9 ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing 提出ClickDiffusion以解决精确图像编辑问题 manipulation
10 RaSim: A Range-aware High-fidelity RGB-D Data Simulation Pipeline for Real-world Applications 提出RaSim以解决RGB-D数据模拟中的真实感问题 sim-to-real

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
11 PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos 提出PhysPT以解决单目视频中人类动态估计的物理不合理性问题 human motion

⬅️ 返回 cs.CV 首页 · 🏠 返回主页