cs.CV(2025-02-25)

📊 共 20 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱八:物理动画 (Physics-based Animation) (3 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (3 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Detecting Offensive Memes with Social Biases in Singapore Context Using Multimodal Large Language Models 利用多模态大语言模型,检测新加坡语境下带有社会偏见的攻击性表情包 large language model multimodal
2 LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation LDGen:通过大语言模型驱动的语言表示增强文本到图像的合成 large language model
3 PromptMID: Modal Invariant Descriptors Based on Diffusion and Vision Foundation Models for Optical-SAR Image Matching PromptMID:基于扩散模型和视觉基础模型的模态不变描述子,用于光学-SAR图像匹配 foundation model
4 Examining the Threat Landscape: Foundation Models and Model Stealing 揭示基础模型易受模型窃取攻击的风险,并提出安全部署建议。 foundation model
5 VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning 提出VOILA基准,用于评估MLLM的感知理解和类比推理能力 large language model multimodal
6 A Fusion Model for Artwork Identification Based on Convolutional Neural Networks and Transformers 提出一种融合CNN和Transformer的艺术品识别模型,提升图像分类精度。 multimodal
7 IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Experts IMPROVE:利用LLM专家迭代优化机器学习流水线,提升对象分类性能 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
8 UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting UniGS:提出基于高斯溅射的统一语言-图像-3D预训练方法 3D gaussian splatting 3DGS gaussian splatting
9 OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation OpenFly:用于空中视觉-语言导航的综合平台与大规模基准数据集 3D gaussian splatting gaussian splatting splatting
10 VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion VLM-E2E:利用多模态驾驶员注意力融合增强端到端自动驾驶 scene understanding multimodal
11 Realistic Clothed Human and Object Joint Reconstruction from a Single Image 提出基于隐式表达和注意力机制的框架,用于单图重建逼真的人体服装和物体 implicit representation

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
12 Progressive Local Alignment for Medical Multimodal Pre-training 提出PLAN,通过渐进式局部对齐网络提升医学多模态预训练效果 contrastive learning multimodal
13 SYNTHIA: Novel Concept Design with Affordance Composition SYNTHIA:基于功能可供性组合的创新概念设计框架 curriculum learning affordance
14 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference OmniAlign-V:增强多模态大语言模型与人类偏好对齐的数据集与评测基准 DPO direct preference optimization large language model

🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)

#题目一句话要点标签🔗
15 ASurvey: Spatiotemporal Consistency in Video Generation 综述:视频生成中的时空一致性研究进展 spatiotemporal foundation model
16 LightFC-X: Lightweight Convolutional Tracker for RGB-X Tracking 提出LightFC-X,一种轻量级卷积RGB-X跟踪器,适用于资源受限设备上的多模态目标跟踪。 spatiotemporal multimodal
17 A digital eye-fixation biomarker using a deep anomaly scheme to classify Parkisonian patterns 提出基于深度异常检测的眼动注视生物标记,用于帕金森病模式分类 spatiotemporal

🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)

#题目一句话要点标签🔗
18 EgoSim: An Egocentric Multi-view Simulator and Real Dataset for Body-worn Cameras during Motion and Activity 提出EgoSim以解决身体佩戴摄像头的运动与活动识别问题 egocentric motion tracking
19 Personalized Federated Learning for Egocentric Video Gaze Estimation with Comprehensive Parameter Frezzing 提出FedCPF,通过全面参数冻结实现个性化联邦学习的注视估计。 egocentric Ego4D
20 Task Graph Maximum Likelihood Estimation for Procedural Activity Understanding in Egocentric Videos 提出基于梯度优化的任务图最大似然估计方法,用于理解自中心视频中的程序性活动。 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页