cs.CV(2025-12-07)

📊 共 21 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (7) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 RAVE: Rate-Adaptive Visual Encoding for 3D Gaussian Splatting 提出RAVE:一种速率自适应的3D高斯 Splatting视觉编码方法 3D gaussian splatting 3DGS gaussian splatting
2 RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian Splatting 提出RDSplat,增强3D高斯溅射水印对扩散编辑的鲁棒性 3D gaussian splatting 3DGS gaussian splatting
3 1 + 1 > 2: Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning 提出DEViL:一种结合开放词汇检测器的视频大语言模型,用于时空定位与推理。 open-vocabulary open vocabulary large language model
4 CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks CoT4Det:面向感知型视觉-语言任务的思维链框架 depth estimation chain-of-thought
5 MeshSplatting: Differentiable Rendering with Opaque Meshes 提出MeshSplatting,通过可微渲染优化网格几何与外观,实现实时新视角合成。 3D gaussian splatting gaussian splatting splatting
6 Overcoming Small Data Limitations in Video-Based Infant Respiration Estimation 提出AIR-400数据集与呼吸估计算法,克服婴儿视频呼吸估计中小样本难题 optical flow spatiotemporal
7 Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training AutoQ-VIS:基于质量引导自训练提升无监督视频实例分割性能 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
8 The Role of Entropy in Visual Grounding: Analysis and Optimization 提出ECVGPO算法,通过熵控制优化视觉定位任务中的多模态大语言模型 reinforcement learning large language model multimodal
9 MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning MMDuet2:通过多轮强化学习增强视频MLLM的主动交互能力 reinforcement learning large language model multimodal
10 TextMamba: Scene Text Detector with Mamba TextMamba:结合Mamba选择机制的场景文本检测器,提升长序列信息提取能力。 Mamba state space model
11 Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution 提出基于掩码自编码器的强引力透镜图像预训练方法,用于暗物质模型分类和超分辨率重建。 masked autoencoder MAE
12 EMGauss: Continuous Slice-to-3D Reconstruction via Dynamic Gaussian Modeling in Volume Electron Microscopy EMGauss:基于动态高斯建模的连续切片到3D重建方法,用于体电子显微镜 teacher-student gaussian splatting splatting
13 VDOT: Efficient Unified Video Creation via Optimal Transport Distillation VDOT:通过最优传输蒸馏实现高效统一的视频生成 distillation
14 RunawayEvil: Jailbreaking the Image-to-Video Generative Models 提出RunawayEvil框架,用于破解图像到视频生成模型的安全性。 reinforcement learning multimodal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
15 NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification NeuroABench:用于神经外科解剖结构识别的多模态评估基准 large language model multimodal
16 Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial Understanding 提出Stitch and Tell方法,通过结构化多模态数据增强提升视觉语言模型的空间理解能力 multimodal
17 Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior 提出DyToK以解决长视频理解中的动态令牌压缩问题 large language model
18 RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models 提出RMAdapter,通过重构学习增强视觉-语言模型在少样本学习中的泛化能力。 multimodal
19 Generalized Geometry Encoding Volume for Real-time Stereo Matching 提出GGEV,一种具有强泛化能力的实时立体匹配网络 foundation model
20 Personalized Image Descriptions from Attention Sequences DEPER:利用个性化注意力序列生成更符合人类感知的图像描述 multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
21 Pseudo Anomalies Are All You Need: Diffusion-Based Generation for Weakly-Supervised Video Anomaly Detection 提出PA-VAD,利用扩散模型生成伪异常视频,解决弱监督视频异常检测中异常数据稀缺问题。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页