cs.CV(2025-04-11)
📊 共 28 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (7 🔗1)
支柱一:机器人控制 (Robot Control) (6 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (5)
支柱七:动作重定向 (Motion Retargeting) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
🔬 支柱一:机器人控制 (Robot Control) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | FMLGS: Fast Multilevel Language Embedded Gaussians for Part-level Interactive Agents | 提出FMLGS,加速3D高斯溅射中零件级交互式Agent构建与查询。 | manipulation 3D gaussian splatting 3DGS | ||
| 9 | MBE-ARI: A Multimodal Dataset Mapping Bi-directional Engagement in Animal-Robot Interaction | 提出MBE-ARI多模态数据集,促进动物-机器人交互中的双向沟通研究。 | quadruped legged robot multimodal | ✅ | |
| 10 | Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization | 提出Video-MSG以解决文本到视频生成中的布局控制问题 | manipulation multimodal | ||
| 11 | Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging | 提出潜空间扩散自编码器(LDAE),用于高效且有意义的医学图像无监督表征学习。 | manipulation representation learning MAE | ✅ | |
| 12 | A Knowledge-guided Adversarial Defense for Resisting Malicious Visual Manipulation | 提出知识引导的对抗防御(KGAD)以抵抗恶意视觉篡改 | manipulation | ||
| 13 | Multi-person Physics-based Pose Estimation for Combat Sports | 提出基于物理的多人姿态估计框架,用于提升格斗运动场景下的3D姿态估计精度。 | trajectory optimization |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Robust SAM: On the Adversarial Robustness of Vision Foundation Models | 提出对抗鲁棒性框架,提升SAM在不同提示下的防御能力和精度平衡 | foundation model | ||
| 15 | Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images | 提出一种基于多模态LLM的系统,用于分析大规模图像集合中的时序变化趋势。 | multimodal | ||
| 16 | Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model | Seaweed-7B:一种高性价比的视频生成基础模型训练方法 | foundation model | ||
| 17 | LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs | 提出LMM4LMM,一种基于LMM的图像生成自动评估指标与基准数据集EvalMi-50K。 | multimodal | ✅ | |
| 18 | VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering | 提出VLMT,用于解决多模态多跳问答中跨模态推理能力不足的问题。 | multimodal | ||
| 19 | F$^3$Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos | 提出F$^3$Set基准数据集,用于分析视频中快速、频繁和细粒度的事件,并提出F$^3$ED模型。 | TAMP | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation | 利用高斯溅射进行合成数据生成,提升实例分割模型训练效果 | depth estimation monocular depth gaussian splatting | ||
| 21 | HAL-NeRF: High Accuracy Localization Leveraging Neural Radiance Fields | HAL-NeRF:利用神经辐射场实现高精度相机定位 | NeRF neural radiance field | ||
| 22 | Parameter-Free Fine-tuning via Redundancy Elimination for Vision Foundation Models | 提出基于冗余消除的免参数微调方法,用于视觉基础模型适应下游任务 | depth estimation foundation model | ||
| 23 | Hardware, Algorithms, and Applications of the Neuromorphic Vision Sensor: a Review | 综述:神经形态视觉传感器的硬件、算法与应用 | optical flow | ||
| 24 | GeoTexBuild: 3D Building Model Generation from Map Footprints | GeoTexBuild:提出一种从地图轮廓生成3D建筑模型的新框架 | height map |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | Geometric Consistency Refinement for Single Image Novel View Synthesis via Test-Time Adaptation of Diffusion Models | 提出基于测试时自适应扩散模型的几何一致性优化方法,提升单图新视角合成质量 | geometric consistency | ||
| 26 | RealCam-Vid: High-resolution Video Dataset with Dynamic Scenes and Metric-scale Camera Movements | 提出RealCam-Vid:首个具有动态场景和度量尺度相机运动的高分辨率视频数据集。 | geometric consistency | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input | 提出Ego4o框架以解决多模态人类动作捕捉问题 | VQ-VAE egocentric |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | The Invisible EgoHand: 3D Hand Forecasting through EgoBody Pose Estimation | 提出EgoH4以解决手部姿态预测中的可见性限制问题 | egocentric | ✅ |