cs.CV（2025-12-07）

📊 共 21 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (7 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (7) 支柱九：具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	RAVE: Rate-Adaptive Visual Encoding for 3D Gaussian Splatting	提出RAVE：一种速率自适应的3D高斯 Splatting视觉编码方法	3D gaussian splatting 3DGS gaussian splatting	✅
2	RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian Splatting	提出RDSplat，增强3D高斯溅射水印对扩散编辑的鲁棒性	3D gaussian splatting 3DGS gaussian splatting
3	1 + 1 > 2: Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning	提出DEViL：一种结合开放词汇检测器的视频大语言模型，用于时空定位与推理。	open-vocabulary open vocabulary large language model	✅
4	CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks	CoT4Det：面向感知型视觉-语言任务的思维链框架	depth estimation chain-of-thought
5	MeshSplatting: Differentiable Rendering with Opaque Meshes	提出MeshSplatting，通过可微渲染优化网格几何与外观，实现实时新视角合成。	3D gaussian splatting gaussian splatting splatting	✅
6	Overcoming Small Data Limitations in Video-Based Infant Respiration Estimation	提出AIR-400数据集与呼吸估计算法，克服婴儿视频呼吸估计中小样本难题	optical flow spatiotemporal
7	Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training	AutoQ-VIS：基于质量引导自训练提升无监督视频实例分割性能	optical flow	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
8	The Role of Entropy in Visual Grounding: Analysis and Optimization	提出ECVGPO算法，通过熵控制优化视觉定位任务中的多模态大语言模型	reinforcement learning large language model multimodal
9	MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning	MMDuet2：通过多轮强化学习增强视频MLLM的主动交互能力	reinforcement learning large language model multimodal
10	TextMamba: Scene Text Detector with Mamba	TextMamba：结合Mamba选择机制的场景文本检测器，提升长序列信息提取能力。	Mamba state space model
11	Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution	提出基于掩码自编码器的强引力透镜图像预训练方法，用于暗物质模型分类和超分辨率重建。	masked autoencoder MAE
12	EMGauss: Continuous Slice-to-3D Reconstruction via Dynamic Gaussian Modeling in Volume Electron Microscopy	EMGauss：基于动态高斯建模的连续切片到3D重建方法，用于体电子显微镜	teacher-student gaussian splatting splatting
13	VDOT: Efficient Unified Video Creation via Optimal Transport Distillation	VDOT：通过最优传输蒸馏实现高效统一的视频生成	distillation
14	RunawayEvil: Jailbreaking the Image-to-Video Generative Models	提出RunawayEvil框架，用于破解图像到视频生成模型的安全性。	reinforcement learning multimodal

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
15	NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification	NeuroABench：用于神经外科解剖结构识别的多模态评估基准	large language model multimodal
16	Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial Understanding	提出Stitch and Tell方法，通过结构化多模态数据增强提升视觉语言模型的空间理解能力	multimodal
17	Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior	提出DyToK以解决长视频理解中的动态令牌压缩问题	large language model	✅
18	RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models	提出RMAdapter，通过重构学习增强视觉-语言模型在少样本学习中的泛化能力。	multimodal
19	Generalized Geometry Encoding Volume for Real-time Stereo Matching	提出GGEV，一种具有强泛化能力的实时立体匹配网络	foundation model
20	Personalized Image Descriptions from Attention Sequences	DEPER：利用个性化注意力序列生成更符合人类感知的图像描述	multimodal

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Pseudo Anomalies Are All You Need: Diffusion-Based Generation for Weakly-Supervised Video Anomaly Detection	提出PA-VAD，利用扩散模型生成伪异常视频，解决弱监督视频异常检测中异常数据稀缺问题。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页