cs.CV(2025-06-14)

📊 共 13 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (5) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗2) 支柱八:物理动画 (Physics-based Animation) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
1 InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning InverTune:通过触发器反演和激活调整去除多模态对比学习模型中的后门 contrastive learning foundation model multimodal
2 Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback MAGIC:利用AI专家反馈生成医学准确的皮肤病图像,提升诊断模型性能。 reinforcement learning DPO direct preference optimization
3 Efficient Star Distillation Attention Network for Lightweight Image Super-Resolution 提出星型蒸馏注意力网络SDAN,用于轻量化图像超分辨率重建。 representation learning distillation
4 MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation MS-UMamba:改进的Vision Mamba Unet用于胎儿腹部医学图像分割 Mamba SSM
5 MonoVQD: Monocular 3D Object Detection with Variational Query Denoising and Self-Distillation MonoVQD:基于变分查询去噪和自蒸馏的单目3D目标检测 distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
6 Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025 ATLAS 2025挑战赛:评估并提升多模态大语言模型对抗攻击的安全性 large language model multimodal
7 Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation 提出VisFlow,通过双层注意力干预缓解大型视觉语言模型中的幻觉问题 multimodal
8 Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation 提出音频辅助的测试时视频模型自适应方法,提升模型在噪声环境下的泛化能力 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
9 Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting 提出场景自适应感知稠密化方法以解决高斯点云分布优化问题 3D gaussian splatting 3DGS gaussian splatting
10 Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing 提出一种基于运动感知的推理时注视点优化框架,提升微表情识别中事件相机的眼动追踪精度。 optical flow multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
11 Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding Trust-videoLLMs:首个面向视频理解多模态LLM可信度综合评测基准 spatiotemporal large language model multimodal
12 Demographics-Informed Neural Network for Multi-Modal Spatiotemporal forecasting of Urban Growth and Travel Patterns Using Satellite Imagery 提出人口统计信息驱动的神经网络,用于多模态时空预测城市增长和出行模式 spatiotemporal multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
13 GroupNL: Low-Resource and Robust CNN Design over Cloud and Device GroupNL:面向云边协同的低资源、高鲁棒性CNN设计 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页