cs.CV（2025-02-12）

📊 共 16 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗1) 支柱八：物理动画 (Physics-based Animation) (3 🔗2) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data	mmE5：通过高质量合成数据提升多模态多语言嵌入性能	large language model multimodal	✅
2	BBQ-V: Benchmarking Visual Stereotype Bias in Large Multimodal Models	BBQ-V：构建视觉刻板印象偏见评估基准，揭示大型多模态模型中的社会偏见。	multimodal
3	Foundation Models in Computational Pathology: A Review of Challenges, Opportunities, and Impact	综述计算病理学中的Foundation Model：挑战、机遇与影响	foundation model
4	Learning Human Skill Generators at Key-Step Levels	提出关键步骤技能生成（KS-Gen）任务，用于生成人类技能视频的关键步骤，提升具身智能。	large language model multimodal	✅
5	UniCoRN: Unified Commented Retrieval Network with LMMs	提出UniCoRN，融合多模态检索与大语言模型，解决复杂组合查询问题。	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
6	$\mathsf{CSMAE~}$:~Cataract Surgical Masked Autoencoder (MAE) based Pre-training	提出CSMAE，一种基于掩码自编码器（MAE）的白内障手术视频预训练方法	masked autoencoder MAE spatiotemporal
7	A Novel Approach to for Multimodal Emotion Recognition : Multimodal semantic information fusion	DeepMSI-MER：融合对比学习与视觉序列压缩的多模态情感识别方法	contrastive learning multimodal
8	Hi-End-MAE: Hierarchical encoder-driven masked autoencoders are stronger vision learners for medical image segmentation	Hi-End-MAE：分层编码器驱动的掩码自编码器提升医学图像分割性能	masked autoencoder MAE	✅
9	Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions	提出STNET模型，解决复合草图+文本查询的图像检索难题	contrastive learning multimodal
10	A Survey on Data Curation for Visual Contrastive Learning: Why Crafting Effective Positive and Negative Pairs Matters	综述视觉对比学习中数据清洗方法，关注有效正负样本对构建	contrastive learning

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models	Spatial457：用于评估大型多模态模型6D空间推理能力的诊断基准	PULSE multimodal	✅
12	Brain Latent Progression: Individual-based Spatiotemporal Disease Progression on 3D Brain MRIs via Latent Diffusion	提出Brain Latent Progression (BrLP)，通过潜在扩散模型预测个体化脑部MRI疾病进展。	spatiotemporal	✅
13	Integrating Spatiotemporal Vision Transformer into Digital Twins for High-Resolution Heat Stress Forecasting in Campus Environments	提出ST-ViT数字孪生框架，用于校园环境高分辨率热应力预测	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Human-Centric Foundation Models: Perception, Generation and Agentic Modeling	综述人形通用模型：统一感知、生成与智能体建模，赋能数字人和类人化身。	humanoid foundation model
15	CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation	CineMaster：面向电影级文本到视频生成的三维可控框架	manipulation	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis	FloVD：结合光流与视频扩散模型，实现增强的相机可控视频合成	optical flow motion synthesis

⬅️ 返回 cs.CV 首页 · 🏠 返回主页