cs.CV(2024-10-28)
📊 共 20 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗2)
支柱四:生成式动作 (Generative Motion) (2 🔗1)
支柱八:物理动画 (Physics-based Animation) (1 🔗1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Going Beyond H&E and Oncology: How Do Histopathology Foundation Models Perform for Multi-stain IHC and Immunology? | 评估组织病理学基础模型在多染色免疫组化和免疫学上的泛化能力 | foundation model | ✅ | |
| 2 | Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines | 提出Vision Search Assistant,增强视觉-语言模型作为多模态搜索引擎的能力 | multimodal | ||
| 3 | SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization | 提出SocialGPT,利用LLM进行社会关系推理,并优化Prompt。 | large language model foundation model | ✅ | |
| 4 | LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior | LARP:提出一种基于学习的自回归生成先验的视频Token化方法,提升视频生成质量。 | large language model multimodal | ||
| 5 | Face-MLLM: A Large Face Perception Model | 提出Face-MLLM:一个用于人脸感知的多模态大模型 | large language model multimodal | ||
| 6 | Large Pre-Training Datasets Don't Always Guarantee Robustness after Fine-Tuning | 微调后大预训练模型鲁棒性下降:提出ImageNet-RIB评估鲁棒性继承 | foundation model | ✅ | |
| 7 | Improving Generalization in Visual Reasoning via Self-Ensemble | 提出Self-Ensemble方法,无需训练提升视觉推理模型的泛化能力。 | multimodal |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings | ODGS:基于3D高斯溅射的360度全景图像三维场景重建 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 9 | Grid4D: 4D Decomposed Hash Encoding for High-Fidelity Dynamic Gaussian Splatting | Grid4D:用于高保真动态高斯溅射的4D分解哈希编码 | gaussian splatting splatting | ||
| 10 | MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps | MVSDet:利用高效平面扫描实现多视角室内3D目标检测 | gaussian splatting splatting NeRF | ✅ | |
| 11 | OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup | OmniSep:提出Query-Mixup的统一全模态声音分离框架,实现多模态查询下的声音提取。 | open-vocabulary open vocabulary | ||
| 12 | EEG-Driven 3D Object Reconstruction with Style Consistency and Diffusion Prior | 提出基于脑电信号与扩散先验的3D物体重建方法,提升风格一致性。 | NeRF neural radiance field | ||
| 13 | Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context | 利用动作层级结构和文本上下文增强动作识别 | optical flow |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | ECMamba: Consolidating Selective State Space Model with Retinex Guidance for Efficient Multiple Exposure Correction | 提出基于Mamba和Retinex理论的ECMamba,高效校正多重曝光图像 | Mamba state space model | ✅ | |
| 15 | CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians | CompGS:利用动态优化的3D高斯,释放2D可组合性,实现可组合的文本到3D生成。 | distillation 3D gaussian splatting gaussian splatting | ✅ | |
| 16 | Exploring contextual modeling with linear complexity for point cloud segmentation | 提出MEEPO,结合CNN与Mamba,高效提升点云分割性能。 | Mamba SSM spatial relationship |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | Skinned Motion Retargeting with Dense Geometric Interaction Perception | 提出MeshRet,通过建模密集几何交互实现高质量蒙皮角色动作重定向 | penetration motion retargeting | ✅ | |
| 18 | Constrained Transformer-Based Porous Media Generation to Spatial Distribution of Rock Properties | 提出基于约束Transformer的多孔介质生成方法,用于模拟岩石属性的空间分布 | VQ-VAE spatial relationship |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Bidirectional Recurrence for Cardiac Motion Tracking with Gaussian Process Latent Coding | GPTrack:利用高斯过程潜在编码的双向递归心脏运动追踪框架 | motion tracking | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Synthetica: Large Scale Synthetic Data for Robot Perception | Synthetica:大规模合成数据用于机器人感知,实现快速鲁棒的目标检测。 | sim-to-real |