cs.CV(2025-03-12)
📊 共 34 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (12 🔗4)
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱一:机器人控制 (Robot Control) (4)
支柱八:物理动画 (Physics-based Animation) (3)
支柱六:视频提取与匹配 (Video Extraction) (2)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Close-up-GS: Enhancing Close-Up View Synthesis in 3D Gaussian Splatting with Progressive Self-Training | 提出基于渐进自训练的Close-up-GS,提升3D高斯溅射近距离视角合成质量。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 14 | Motion Blender Gaussian Splatting for Dynamic Scene Reconstruction | 提出Motion Blender Gaussian Splatting,用于动态场景可控重建与运动编辑。 | gaussian splatting splatting scene reconstruction | ✅ | |
| 15 | SDD-4DGS: Static-Dynamic Aware Decoupling in Gaussian Splatting for 4D Scene Reconstruction | SDD-4DGS:基于高斯溅射的静态-动态解耦4D场景重建 | gaussian splatting splatting scene reconstruction | ||
| 16 | GASPACHO: Gaussian Splatting for Controllable Humans and Objects | GASPACHO:提出基于高斯溅射的可控人与物体交互渲染方法 | gaussian splatting splatting physically plausible | ✅ | |
| 17 | OpenVidVRD: Open-Vocabulary Video Visual Relation Detection via Prompt-Driven Semantic Space Alignment | 提出OpenVidVRD框架,通过提示驱动的语义空间对齐实现开放词汇视频视觉关系检测。 | open-vocabulary open vocabulary spatiotemporal | ||
| 18 | DitHub: A Modular Framework for Incremental Open-Vocabulary Object Detection | 提出DitHub框架以解决开放词汇物体检测的适应性问题 | open-vocabulary open vocabulary | ✅ | |
| 19 | Investigation of Frame Differences as Motion Cues for Video Object Segmentation | 提出基于帧差的视频对象分割方法,适用于资源受限的边缘设备 | optical flow |
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation | CleverDistiller:一种简单且空间一致的跨模态知识蒸馏方法,提升3D感知性能。 | distillation semantic map foundation model | ||
| 21 | LuciBot: Automated Robot Policy Learning from Generated Videos | LuciBot:利用生成视频自动学习机器人策略,提升复杂具身任务性能。 | policy learning large language model | ||
| 22 | ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba | ViM-VQ:针对Visual Mamba的高效后训练向量量化方法,提升低比特量化精度。 | Mamba state space model | ||
| 23 | Patch-Wise Hypergraph Contrastive Learning with Dual Normal Distribution Weighting for Multi-Domain Stain Transfer | 提出STNHCL,通过超图对比学习和双正态分布加权实现多域染色转换 | contrastive learning | ||
| 24 | Astrea: A MOE-based Visual Understanding Model with Progressive Alignment | Astrea:一种基于MOE和渐进对齐的视觉理解模型,解决异构任务和专家负载不均衡问题。 | contrastive learning multimodal | ||
| 25 | Memory-enhanced Retrieval Augmentation for Long Video Understanding | 提出MemVid:一种记忆增强的检索增强方法,用于长视频理解 | reinforcement learning curriculum learning |
🔬 支柱一:机器人控制 (Robot Control) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | 2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos | 提出2HandedAfforder,从人类视频中学习精确的可执行双手动作用 | manipulation bi-manual affordance | ||
| 27 | Oh-A-DINO: Understanding and Enhancing Attribute-Level Information in Self-Supervised Object-Centric Representations | Oh-A-DINO:通过增强属性级别信息提升自监督对象中心表示 | manipulation | ||
| 28 | A PyTorch-Enabled Tool for Synthetic Event Camera Data Generation and Algorithm Development | SENPI:一个基于PyTorch的合成事件相机数据生成与算法开发工具 | manipulation | ||
| 29 | Fully-Synthetic Training for Visual Quality Inspection in Automotive Production | 提出基于全合成数据的汽车生产视觉质检训练方法,提升缺陷检测精度。 | domain randomization |
🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 30 | Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos | 提出双向学习面部动画编解码器以解决低比特率视频问题 | ASE | ||
| 31 | I2V3D: Controllable image-to-video generation with 3D guidance | I2V3D:利用3D引导实现可控的图像到视频生成 | character animation | ||
| 32 | Pig behavior dataset and Spatial-temporal perception and enhancement networks based on the attention mechanism for pig behavior recognition | 提出基于注意力机制的时空感知增强网络,用于猪行为识别,并构建了相关数据集。 | spatiotemporal |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 33 | Exo2Ego: Exocentric Knowledge Guided MLLM for Egocentric Video Understanding | 提出Exo2Ego,利用外视知识引导MLLM进行第一人称视角视频理解 | egocentric large language model multimodal | ||
| 34 | Monte Carlo Diffusion for Generalizable Learning-Based RANSAC | 提出基于蒙特卡洛扩散的RANSAC泛化学习方法,提升模型在分布外数据上的鲁棒性 | feature matching |