cs.CV(2025-06-14)
📊 共 13 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (5)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (2 🔗2)
支柱八:物理动画 (Physics-based Animation) (2)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning | InverTune:通过触发器反演和激活调整去除多模态对比学习模型中的后门 | contrastive learning foundation model multimodal | ||
| 2 | Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback | MAGIC:利用AI专家反馈生成医学准确的皮肤病图像,提升诊断模型性能。 | reinforcement learning DPO direct preference optimization | ||
| 3 | Efficient Star Distillation Attention Network for Lightweight Image Super-Resolution | 提出星型蒸馏注意力网络SDAN,用于轻量化图像超分辨率重建。 | representation learning distillation | ||
| 4 | MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation | MS-UMamba:改进的Vision Mamba Unet用于胎儿腹部医学图像分割 | Mamba SSM | ||
| 5 | MonoVQD: Monocular 3D Object Detection with Variational Query Denoising and Self-Distillation | MonoVQD:基于变分查询去噪和自蒸馏的单目3D目标检测 | distillation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 6 | Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025 | ATLAS 2025挑战赛:评估并提升多模态大语言模型对抗攻击的安全性 | large language model multimodal | ✅ | |
| 7 | Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation | 提出VisFlow,通过双层注意力干预缓解大型视觉语言模型中的幻觉问题 | multimodal | ||
| 8 | Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation | 提出音频辅助的测试时视频模型自适应方法,提升模型在噪声环境下的泛化能力 | large language model | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting | 提出场景自适应感知稠密化方法以解决高斯点云分布优化问题 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 10 | Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing | 提出一种基于运动感知的推理时注视点优化框架,提升微表情识别中事件相机的眼动追踪精度。 | optical flow multimodal | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding | Trust-videoLLMs:首个面向视频理解多模态LLM可信度综合评测基准 | spatiotemporal large language model multimodal | ||
| 12 | Demographics-Informed Neural Network for Multi-Modal Spatiotemporal forecasting of Urban Growth and Travel Patterns Using Satellite Imagery | 提出人口统计信息驱动的神经网络,用于多模态时空预测城市增长和出行模式 | spatiotemporal multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | GroupNL: Low-Resource and Robust CNN Design over Cloud and Device | GroupNL:面向云边协同的低资源、高鲁棒性CNN设计 | manipulation |