cs.CV（2025-06-14）

📊 共 13 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (5) 支柱九：具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (2 🔗2) 支柱八：物理动画 (Physics-based Animation) (2) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning	InverTune：通过触发器反演和激活调整去除多模态对比学习模型中的后门	contrastive learning foundation model multimodal
2	Doctor Approved: Generating Medically Accurate Skin Disease Images through AI-Expert Feedback	MAGIC：利用AI专家反馈生成医学准确的皮肤病图像，提升诊断模型性能。	reinforcement learning DPO direct preference optimization
3	Efficient Star Distillation Attention Network for Lightweight Image Super-Resolution	提出星型蒸馏注意力网络SDAN，用于轻量化图像超分辨率重建。	representation learning distillation
4	MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation	MS-UMamba：改进的Vision Mamba Unet用于胎儿腹部医学图像分割	Mamba SSM
5	MonoVQD: Monocular 3D Object Detection with Variational Query Denoising and Self-Distillation	MonoVQD：基于变分查询去噪和自蒸馏的单目3D目标检测	distillation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025	ATLAS 2025挑战赛：评估并提升多模态大语言模型对抗攻击的安全性	large language model multimodal	✅
7	Not All Tokens and Heads Are Equally Important: Dual-Level Attention Intervention for Hallucination Mitigation	提出VisFlow，通过双层注意力干预缓解大型视觉语言模型中的幻觉问题	multimodal
8	Exploring Audio Cues for Enhanced Test-Time Video Model Adaptation	提出音频辅助的测试时视频模型自适应方法，提升模型在噪声环境下的泛化能力	large language model	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Perceptual-GS: Scene-adaptive Perceptual Densification for Gaussian Splatting	提出场景自适应感知稠密化方法以解决高斯点云分布优化问题	3D gaussian splatting 3DGS gaussian splatting	✅
10	Inference-Time Gaze Refinement for Micro-Expression Recognition: Enhancing Event-Based Eye Tracking with Motion-Aware Post-Processing	提出一种基于运动感知的推理时注视点优化框架，提升微表情识别中事件相机的眼动追踪精度。	optical flow multimodal	✅

🔬 支柱八：物理动画 (Physics-based Animation) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding	Trust-videoLLMs：首个面向视频理解多模态LLM可信度综合评测基准	spatiotemporal large language model multimodal
12	Demographics-Informed Neural Network for Multi-Modal Spatiotemporal forecasting of Urban Growth and Travel Patterns Using Satellite Imagery	提出人口统计信息驱动的神经网络，用于多模态时空预测城市增长和出行模式	spatiotemporal multimodal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
13	GroupNL: Low-Resource and Robust CNN Design over Cloud and Device	GroupNL：面向云边协同的低资源、高鲁棒性CNN设计	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页