cs.CV(2024-08-13)
📊 共 21 篇论文 | 🔗 2 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (3)
支柱五:交互与反应 (Interaction & Reaction) (2)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis | 提出SpectralGaussians,用于多光谱场景的语义化、光谱3D高斯splatting表示、可视化与分析。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 2 | HDRGS: High Dynamic Range Gaussian Splatting | 提出HDR-GS方法,利用高动态范围高斯溅射技术重建高质量HDR场景。 | gaussian splatting splatting NeRF | ||
| 3 | NeRF-US: Removing Ultrasound Imaging Artifacts from Neural Radiance Fields in the Wild | NeRF-US:提出一种去除野生超声成像神经辐射场伪影的方法 | NeRF neural radiance field | ||
| 4 | SceneGPT: A Language Model for 3D Scene Understanding | SceneGPT:一种用于3D场景理解的语言模型,无需3D预训练。 | scene understanding affordance spatial relationship | ||
| 5 | SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields | 提出SlotLifter,通过槽引导特征提升学习面向对象的辐射场,实现场景重建与分解。 | scene reconstruction | ||
| 6 | ActiveNeRF: Learning Accurate 3D Geometry by Active Pattern Projection | ActiveNeRF:通过主动图案投影学习精确3D几何 | NeRF | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | CROME: Cross-Modal Adapters for Efficient Multimodal LLM | CROME:用于高效多模态LLM的跨模态适配器 | large language model multimodal instruction following | ||
| 8 | PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology | PathInsight:指令微调多模态模型,助力病理学智能辅助诊断 | multimodal | ||
| 9 | Multimodal Analysis of White Blood Cell Differentiation in Acute Myeloid Leukemia Patients using a β-Variational Autoencoder | 提出基于β-VAE的多模态分析方法,用于理解急性髓系白血病患者的白细胞分化。 | multimodal | ||
| 10 | Sumotosima: A Framework and Dataset for Classifying and Summarizing Otoscopic Images | Sumotosima:用于耳镜图像分类与摘要的深度学习框架与数据集 | multimodal | ✅ | |
| 11 | DC3DO: Diffusion Classifier for 3D Objects | DC3DO:利用扩散模型进行零样本3D物体分类,无需额外训练。 | multimodal | ||
| 12 | Specialized Change Detection using Segment Anything | 提出基于SAM的专精变化检测方法,解决特定目标消失检测问题。 | foundation model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection | 结合显著性排序与强化学习,提升轻量级目标检测性能 | reinforcement learning deep reinforcement learning | ||
| 14 | Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator | 提出Inter-class Feature Compensator (INFER),高效解决数据集蒸馏中的类间特征隔离问题。 | distillation | ||
| 15 | Oracle Bone Script Similiar Character Screening Approach Based on Simsiam Contrastive Learning and Supervised Learning | 提出基于SimSiam对比学习和监督学习的甲骨文相似字筛选方法 | contrastive learning |
🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Efficient Human-Object-Interaction (EHOI) Detection via Interaction Label Coding and Conditional Decision | 提出一种高效的人-物交互检测器EHOI,兼顾性能、效率和可解释性。 | human-object interaction HOI | ||
| 17 | MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers | MV-DETR:基于多视角DETR Transformer的多模态室内物体检测 | ReMoS |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | ViMo: Generating Motions from Casual Videos | 提出ViMo以解决视频生成3D人类动作的挑战 | motion generation video-to-motion |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Controlling the World by Sleight of Hand | CosHand:提出动作条件生成模型,用于预测手部与物体交互后的图像变化 | manipulation world model |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Visual Neural Decoding via Improved Visual-EEG Semantic Consistency | 提出Visual-EEG语义解耦框架,提升脑电信号视觉神经解码的语义一致性 | geometric consistency |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Dynamic and Compressive Adaptation of Transformers From Images to Videos | 提出InTI,通过动态帧间Token插值实现Transformer从图像到视频的压缩自适应。 | spatiotemporal |