cs.CV(2025-12-07)
📊 共 21 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗4)
支柱二:RL算法与架构 (RL & Architecture) (7)
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | RAVE: Rate-Adaptive Visual Encoding for 3D Gaussian Splatting | 提出RAVE:一种速率自适应的3D高斯 Splatting视觉编码方法 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 2 | RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian Splatting | 提出RDSplat,增强3D高斯溅射水印对扩散编辑的鲁棒性 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 3 | 1 + 1 > 2: Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning | 提出DEViL:一种结合开放词汇检测器的视频大语言模型,用于时空定位与推理。 | open-vocabulary open vocabulary large language model | ✅ | |
| 4 | CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks | CoT4Det:面向感知型视觉-语言任务的思维链框架 | depth estimation chain-of-thought | ||
| 5 | MeshSplatting: Differentiable Rendering with Opaque Meshes | 提出MeshSplatting,通过可微渲染优化网格几何与外观,实现实时新视角合成。 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 6 | Overcoming Small Data Limitations in Video-Based Infant Respiration Estimation | 提出AIR-400数据集与呼吸估计算法,克服婴儿视频呼吸估计中小样本难题 | optical flow spatiotemporal | ||
| 7 | Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training | AutoQ-VIS:基于质量引导自训练提升无监督视频实例分割性能 | optical flow | ✅ |
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification | NeuroABench:用于神经外科解剖结构识别的多模态评估基准 | large language model multimodal | ||
| 16 | Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial Understanding | 提出Stitch and Tell方法,通过结构化多模态数据增强提升视觉语言模型的空间理解能力 | multimodal | ||
| 17 | Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior | 提出DyToK以解决长视频理解中的动态令牌压缩问题 | large language model | ✅ | |
| 18 | RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models | 提出RMAdapter,通过重构学习增强视觉-语言模型在少样本学习中的泛化能力。 | multimodal | ||
| 19 | Generalized Geometry Encoding Volume for Real-time Stereo Matching | 提出GGEV,一种具有强泛化能力的实时立体匹配网络 | foundation model | ||
| 20 | Personalized Image Descriptions from Attention Sequences | DEPER:利用个性化注意力序列生成更符合人类感知的图像描述 | multimodal |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Pseudo Anomalies Are All You Need: Diffusion-Based Generation for Weakly-Supervised Video Anomaly Detection | 提出PA-VAD,利用扩散模型生成伪异常视频,解决弱监督视频异常检测中异常数据稀缺问题。 | spatiotemporal |