cs.CV（2025-05-13）

📊 共 22 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (7 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (6) 支柱九：具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱八：物理动画 (Physics-based Animation) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	DLO-Splatting: Tracking Deformable Linear Objects Using 3D Gaussian Splatting	DLO-Splatting：利用3D高斯溅射追踪可变形线性物体	3D gaussian splatting gaussian splatting splatting
2	ADC-GS: Anchor-Driven Deformable and Compressed Gaussian Splatting for Dynamic Scene Reconstruction	提出ADC-GS，通过锚点驱动的可变形压缩高斯溅射实现动态场景高效重建。	gaussian splatting splatting scene reconstruction	✅
3	A Survey of 3D Reconstruction with Event Cameras	首个事件相机三维重建综述，系统梳理方法并展望未来方向。	3D gaussian splatting 3DGS gaussian splatting
4	Monocular Depth Guided Occlusion-Aware Disparity Refinement via Semi-supervised Learning in Laparoscopic Images	提出深度引导的遮挡感知视差精炼网络以解决外科图像中的视差估计问题	monocular depth optical flow
5	Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World	BooSTer：利用大规模混合图像源提升零样本立体匹配性能	depth estimation monocular depth foundation model
6	EventDiff: A Unified and Efficient Diffusion Model Framework for Event-based Video Frame Interpolation	EventDiff：一种统一高效的基于事件的视频帧插值扩散模型框架	optical flow
7	SpNeRF: Memory Efficient Sparse Volumetric Neural Rendering Accelerator for Edge Devices	SpNeRF：面向边缘设备的内存高效稀疏体神经渲染加速器	NeRF

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection	提出轨迹感知自适应标记选择以解决视频建模中的掩蔽策略问题	reinforcement learning PPO masked autoencoder
9	DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art	DFA-CON：基于对比学习的DeepFake艺术品版权侵权检测方法	contrastive learning foundation model
10	Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning	提出基于强化学习的云环境自适应安全策略管理框架	reinforcement learning deep reinforcement learning
11	OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning	提出OpenThinkIMG以解决视觉工具增强学习的标准化问题	reinforcement learning
12	Leveraging Multi-Modal Information to Enhance Dataset Distillation	提出多模态数据集蒸馏框架，利用文本信息和对象掩码提升图像数据集蒸馏效果。	distillation
13	MoKD: Multi-Task Optimization for Knowledge Distillation	提出MoKD，通过多任务优化知识蒸馏解决梯度冲突和知识鸿沟问题。	distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
14	An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care	提出Meta-EyeFM，用于眼科初级诊疗的集成语言-视觉基础模型	large language model foundation model
15	Generative AI for Autonomous Driving: Frontiers and Opportunities	综述性论文：探索生成式AI在自动驾驶领域的应用前沿与机遇	embodied AI large language model multimodal	✅
16	Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction	提出一种多模态融合方法，利用血糖监测和食物图像预测食物热量	multimodal
17	Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training	PRIOR：通过图像相关Token优先级排序增强视觉-语言预训练	large language model
18	Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion	提出VIF$^2$模型，融合视觉和食材特征，提升膳食营养估计精度。	multimodal	✅
19	Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion	ResULIC：融合语义残差编码与压缩感知扩散的超低码率图像压缩	multimodal

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
20	TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection	提出TT-DF大规模扩散模型伪造人体数据集与基准，用于人体伪造检测。	manipulation optical flow spatiotemporal	✅
21	Removing Watermarks with Partial Regeneration using Semantic Information	提出SemanticRegen，一种利用语义信息的图像水印去除方法，有效攻击现有语义水印方案。	manipulation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series	TiMo：面向卫星图像时间序列的时空基础模型，有效捕捉多尺度时空关系。	spatiotemporal foundation model	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页