cs.CV(2025-05-13)

📊 共 22 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (6) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 DLO-Splatting: Tracking Deformable Linear Objects Using 3D Gaussian Splatting DLO-Splatting:利用3D高斯溅射追踪可变形线性物体 3D gaussian splatting gaussian splatting splatting
2 ADC-GS: Anchor-Driven Deformable and Compressed Gaussian Splatting for Dynamic Scene Reconstruction 提出ADC-GS,通过锚点驱动的可变形压缩高斯溅射实现动态场景高效重建。 gaussian splatting splatting scene reconstruction
3 A Survey of 3D Reconstruction with Event Cameras 首个事件相机三维重建综述,系统梳理方法并展望未来方向。 3D gaussian splatting 3DGS gaussian splatting
4 Monocular Depth Guided Occlusion-Aware Disparity Refinement via Semi-supervised Learning in Laparoscopic Images 提出深度引导的遮挡感知视差精炼网络以解决外科图像中的视差估计问题 monocular depth optical flow
5 Boosting Zero-shot Stereo Matching using Large-scale Mixed Images Sources in the Real World BooSTer:利用大规模混合图像源提升零样本立体匹配性能 depth estimation monocular depth foundation model
6 EventDiff: A Unified and Efficient Diffusion Model Framework for Event-based Video Frame Interpolation EventDiff:一种统一高效的基于事件的视频帧插值扩散模型框架 optical flow
7 SpNeRF: Memory Efficient Sparse Volumetric Neural Rendering Accelerator for Edge Devices SpNeRF:面向边缘设备的内存高效稀疏体神经渲染加速器 NeRF

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
8 Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection 提出轨迹感知自适应标记选择以解决视频建模中的掩蔽策略问题 reinforcement learning PPO masked autoencoder
9 DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art DFA-CON:基于对比学习的DeepFake艺术品版权侵权检测方法 contrastive learning foundation model
10 Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning 提出基于强化学习的云环境自适应安全策略管理框架 reinforcement learning deep reinforcement learning
11 OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning 提出OpenThinkIMG以解决视觉工具增强学习的标准化问题 reinforcement learning
12 Leveraging Multi-Modal Information to Enhance Dataset Distillation 提出多模态数据集蒸馏框架,利用文本信息和对象掩码提升图像数据集蒸馏效果。 distillation
13 MoKD: Multi-Task Optimization for Knowledge Distillation 提出MoKD,通过多任务优化知识蒸馏解决梯度冲突和知识鸿沟问题。 distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
14 An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care 提出Meta-EyeFM,用于眼科初级诊疗的集成语言-视觉基础模型 large language model foundation model
15 Generative AI for Autonomous Driving: Frontiers and Opportunities 综述性论文:探索生成式AI在自动驾驶领域的应用前沿与机遇 embodied AI large language model multimodal
16 Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction 提出一种多模态融合方法,利用血糖监测和食物图像预测食物热量 multimodal
17 Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training PRIOR:通过图像相关Token优先级排序增强视觉-语言预训练 large language model
18 Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion 提出VIF$^2$模型,融合视觉和食材特征,提升膳食营养估计精度。 multimodal
19 Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion ResULIC:融合语义残差编码与压缩感知扩散的超低码率图像压缩 multimodal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
20 TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection 提出TT-DF大规模扩散模型伪造人体数据集与基准,用于人体伪造检测。 manipulation optical flow spatiotemporal
21 Removing Watermarks with Partial Regeneration using Semantic Information 提出SemanticRegen,一种利用语义信息的图像水印去除方法,有效攻击现有语义水印方案。 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
22 TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series TiMo:面向卫星图像时间序列的时空基础模型,有效捕捉多尺度时空关系。 spatiotemporal foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页