cs.CV(2025-05-13)
📊 共 22 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱八:物理动画 (Physics-based Animation) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection | 提出轨迹感知自适应标记选择以解决视频建模中的掩蔽策略问题 | reinforcement learning PPO masked autoencoder | ||
| 9 | DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art | DFA-CON:基于对比学习的DeepFake艺术品版权侵权检测方法 | contrastive learning foundation model | ||
| 10 | Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning | 提出基于强化学习的云环境自适应安全策略管理框架 | reinforcement learning deep reinforcement learning | ||
| 11 | OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning | 提出OpenThinkIMG以解决视觉工具增强学习的标准化问题 | reinforcement learning | ||
| 12 | Leveraging Multi-Modal Information to Enhance Dataset Distillation | 提出多模态数据集蒸馏框架,利用文本信息和对象掩码提升图像数据集蒸馏效果。 | distillation | ||
| 13 | MoKD: Multi-Task Optimization for Knowledge Distillation | 提出MoKD,通过多任务优化知识蒸馏解决梯度冲突和知识鸿沟问题。 | distillation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | An integrated language-vision foundation model for conversational diagnostics and triaging in primary eye care | 提出Meta-EyeFM,用于眼科初级诊疗的集成语言-视觉基础模型 | large language model foundation model | ||
| 15 | Generative AI for Autonomous Driving: Frontiers and Opportunities | 综述性论文:探索生成式AI在自动驾驶领域的应用前沿与机遇 | embodied AI large language model multimodal | ✅ | |
| 16 | Multimodal Fusion of Glucose Monitoring and Food Imagery for Caloric Content Prediction | 提出一种多模态融合方法,利用血糖监测和食物图像预测食物热量 | multimodal | ||
| 17 | Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training | PRIOR:通过图像相关Token优先级排序增强视觉-语言预训练 | large language model | ||
| 18 | Advancing Food Nutrition Estimation via Visual-Ingredient Feature Fusion | 提出VIF$^2$模型,融合视觉和食材特征,提升膳食营养估计精度。 | multimodal | ✅ | |
| 19 | Ultra Lowrate Image Compression with Semantic Residual Coding and Compression-aware Diffusion | ResULIC:融合语义残差编码与压缩感知扩散的超低码率图像压缩 | multimodal |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | TT-DF: A Large-Scale Diffusion-Based Dataset and Benchmark for Human Body Forgery Detection | 提出TT-DF大规模扩散模型伪造人体数据集与基准,用于人体伪造检测。 | manipulation optical flow spatiotemporal | ✅ | |
| 21 | Removing Watermarks with Partial Regeneration using Semantic Information | 提出SemanticRegen,一种利用语义信息的图像水印去除方法,有效攻击现有语义水印方案。 | manipulation |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series | TiMo:面向卫星图像时间序列的时空基础模型,有效捕捉多尺度时空关系。 | spatiotemporal foundation model | ✅ |