cs.CV(2025-12-22)
📊 共 26 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (10 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (5 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱一:机器人控制 (Robot Control) (3)
支柱六:视频提取与匹配 (Video Extraction) (2)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | 4D Gaussian Splatting as a Learned Dynamical System | EvoGS:将4D高斯溅射重构为可学习的动态系统,实现时序一致性动态场景建模 | gaussian splatting splatting | ||
| 12 | GaussianImage++: Boosted Image Representation and Compression with 2D Gaussian Splatting | GaussianImage++:利用2D高斯溅射增强图像表示与压缩性能 | gaussian splatting splatting | ||
| 13 | Retrieving Objects from 3D Scenes with Box-Guided Open-Vocabulary Instance Segmentation | 提出基于2D框引导的开放词汇实例分割方法,用于从3D场景中检索目标 | open-vocabulary open vocabulary | ✅ | |
| 14 | CETCAM: Camera-Controllable Video Generation via Consistent and Extensible Tokenization | CETCAM:通过一致且可扩展的Token化实现相机可控的视频生成 | depth estimation VGGT geometric consistency | ✅ | |
| 15 | WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion | WorldWarp:利用异步视频扩散传播3D几何信息,生成长时几何一致性视频。 | 3DGS gaussian splatting splatting | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Anatomy-R1: Enhancing Anatomy Reasoning in Multimodal Large Language Models via Anatomical Similarity Curriculum and Group Diversity Augmentation | Anatomy-R1:通过解剖相似性课程学习和群体多样性增强提升多模态大语言模型中的解剖推理能力 | curriculum learning large language model multimodal | ✅ | |
| 17 | Pushing the Frontier of Audiovisual Perception with Large-Scale Multimodal Correspondence Learning | 提出PE-AV:基于大规模对比学习的音视频感知统一编码器,实现跨模态对齐与检索。 | contrastive learning multimodal | ||
| 18 | FusionNet: Physics-Aware Representation Learning for Multi-Spectral and Thermal Data via Trainable Signal-Processing Priors | FusionNet:通过可训练信号处理先验实现多光谱与热数据的物理感知表征学习 | representation learning | ||
| 19 | WaTeRFlow: Watermark Temporal Robustness via Flow Consistency | 提出WaTeRFlow框架,增强水印在图像转视频中的时间鲁棒性 | world model optical flow |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | Decoupled Generative Modeling for Human-Object Interaction Synthesis | 提出DecHOI,解耦路径规划与动作生成,实现逼真的人-物交互合成 | manipulation penetration human-object interaction | ||
| 21 | Zero-shot Reconstruction of In-Scene Object Manipulation from Video | 提出首个系统,从单目视频零样本重建场景内物体操作过程。 | manipulation scene reconstruction physically plausible | ||
| 22 | VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation | VLNVerse:用于视觉-语言导航的多功能、具身、逼真模拟与评估基准 | locomotion sim-to-real embodied AI |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | PEDESTRIAN: An Egocentric Vision Dataset for Obstacle Detection on Pavements | 提出行人视角障碍物检测数据集PEDESTRIAN,用于提升城市人行道安全。 | egocentric egocentric vision | ||
| 24 | Hand-Aware Egocentric Motion Reconstruction with Sequence-Level Context | 提出HaMoS:一种手部感知的序列级自中心运动重建扩散框架 | egocentric egocentric vision |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | OmniMoGen: Unifying Human Motion Generation via Learning from Interleaved Text-Motion Instructions | OmniMoGen:通过学习交错的文本-动作指令,统一了人体运动生成任务。 | text-to-motion motion generation VQ-VAE | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | Towards AI-Guided Open-World Ecological Taxonomic Classification | 提出TaxoNet,解决开放世界生态分类中的长尾分布和领域偏移问题。 | spatiotemporal foundation model multimodal |