cs.CV(2024-11-14)
📊 共 23 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (5)
支柱一:机器人控制 (Robot Control) (4)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation | 提出Trident框架,无需训练即可实现高性能开放词汇分割 | open-vocabulary open vocabulary foundation model | ✅ | |
| 9 | Self-Supervised Monocular 4D Scene Reconstruction for Egocentric Videos | 提出EgoMono4D,用于自监督单目4D利己视频场景重建。 | scene reconstruction egocentric | ||
| 10 | DyGASR: Dynamic Generalized Exponential Splatting with Surface Alignment for Accelerated 3D Mesh Reconstruction | DyGASR:基于动态广义指数Splatting与表面对齐的加速3D网格重建 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 11 | Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting | Architect:利用分层2D图像修复生成生动交互式3D场景 | depth estimation embodied AI large language model | ||
| 12 | CropCraft: Complete Structural Characterization of Crop Plants From Images | CropCraft:提出基于逆向程序建模的农作物完整三维结构重建方法 | neural radiance field | ||
| 13 | MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation | MFTIQ:一种具有独立匹配质量估计的多流跟踪器,提升长时跟踪性能。 | optical flow | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Advancing Stroke Risk Prediction Using a Multi-modal Foundation Model | 提出多模态自监督框架,融合脑部影像与临床数据,提升卒中风险预测精度。 | predictive model contrastive learning foundation model | ||
| 15 | Towards Neural Foundation Models for Vision: Aligning EEG, MEG, and fMRI Representations for Decoding, Encoding, and Modality Conversion | 提出一种神经基础模型,通过对齐脑电、脑磁和功能磁共振表征实现视觉信息的多模态转换。 | contrastive learning foundation model multimodal | ||
| 16 | VPBSD:Vessel-Pattern-Based Semi-Supervised Distillation for Efficient 3D Microscopic Cerebrovascular Segmentation | 提出基于血管模式的半监督蒸馏方法(VpbSD),用于高效的3D显微脑血管分割。 | distillation | ||
| 17 | Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction | 提出2DRCL,用于长尾目标检测预训练,提升尾部类别性能。 | contrastive learning | ||
| 18 | BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation | BEARD:数据集蒸馏对抗鲁棒性评测基准,解决现有方法安全性评估缺失问题。 | distillation |
🔬 支柱一:机器人控制 (Robot Control) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception | 评估ChatGPT在音视频深度伪造检测中的能力,并与AI模型和人类感知进行对比 | manipulation spatiotemporal large language model | ||
| 20 | MagicQuill: An Intelligent Interactive Image Editing System | MagicQuill:一个智能交互式图像编辑系统,通过多模态LLM实时预测编辑意图。 | manipulation large language model multimodal | ||
| 21 | VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation | VidMan:利用视频扩散模型中的隐式动力学,提升机器人操作性能 | manipulation world model | ||
| 22 | Computational metaoptics for imaging | 计算超构光学:结合超构表面与计算成像,突破传统成像限制 | manipulation |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation | JoyVASA:提出基于解耦表示和扩散模型的音视频驱动人像及动物图像动画生成方法 | motion generation | ✅ |