cs.CV(2025-03-04)

📊 共 22 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (3) 支柱八:物理动画 (Physics-based Animation) (2 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 A Token-level Text Image Foundation Model for Document Understanding 提出TokenOCR:面向文档理解的Token级文本图像基础模型 large language model foundation model
2 Multimodal Deep Learning for Subtype Classification in Breast Cancer Using Histopathological Images and Gene Expression Data 提出多模态深度学习框架以解决乳腺癌亚型分类问题 multimodal
3 SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models SPIDER:构建多器官病理图像数据集并提出基线模型,促进AI病理学研究 foundation model multimodal
4 BioD2C: A Dual-level Semantic Consistency Constraint Framework for Biomedical VQA BioD2C:双层语义一致性约束框架,提升生物医学VQA性能 large language model multimodal
5 CADDI: An in-Class Activity Detection Dataset using IMU data from low-cost sensors CADDI:提出一个基于低成本IMU的课堂活动检测数据集,促进教育场景下的活动识别。 multimodal
6 StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts StageDesigner:利用剧本生成艺术化舞台场景的综合框架 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
7 Developing a PET/CT Foundation Model for Cross-Modal Anatomical and Functional Imaging 提出Cross-Fraternal Twin Masked Autoencoder,用于PET/CT跨模态解剖和功能成像 representation learning masked autoencoder foundation model
8 LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning LLaVE:基于难度加权对比学习的大型语言-视觉嵌入模型,实现SOTA性能。 representation learning contrastive learning multimodal
9 SSNet: Saliency Prior and State Space Model-based Network for Salient Object Detection in RGB-D Images 提出基于显著性先验和状态空间模型的SSNet,用于RGB-D图像的显著性目标检测。 SSM state space model scene understanding
10 WMNav: Integrating Vision-Language Models into World Models for Object Goal Navigation WMNav:融合视觉-语言模型与世界模型的物体目标导航框架 world model embodied AI

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
11 2DGS-Avatar: Animatable High-fidelity Clothed Avatar via 2D Gaussian Splatting 提出2DGS-Avatar,通过2D高斯溅射实现高保真可动画的服装人像实时渲染。 3D gaussian splatting 3DGS gaussian splatting
12 Resource-Efficient Affordance Grounding with Complementary Depth and Semantic Prompts 提出BiT-Align框架,利用互补深度和语义提示提升资源受限下的可供性推理性能。 affordance multimodal
13 Label-Efficient LiDAR Panoptic Segmentation 提出L3PS,利用少量标注数据实现高效LiDAR全景分割 scene understanding

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
14 SPG: Improving Motion Diffusion by Smooth Perturbation Guidance SPG:通过平滑扰动引导提升运动扩散模型的生成质量 motion diffusion model motion diffusion
15 ARC-Flow : Articulated, Resolution-Agnostic, Correspondence-Free Matching and Interpolation of 3D Shapes Under Flow Fields 提出ARC-Flow,通过流场实现铰接3D形状的无对应关系匹配与插值。 physically plausible
16 Efficient Training-Free High-Resolution Synthesis with Energy Rectification in Diffusion Models 提出RectifiedHR,一种高效无训练的扩散模型高分辨率图像合成方法 classifier-free guidance

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
17 MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments 提出MM-OR手术室多模态数据集与MM2SG模型,用于提升高强度手术环境的语义理解。 spatiotemporal multimodal
18 TReND: Transformer derived features and Regularized NMF for neonatal functional network Delineation 提出TReND框架,利用Transformer和正则化NMF进行新生儿功能网络划分 spatiotemporal

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
19 Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs 提出MapleLeaf AKI,通过解耦因果注意力实现多模态LLM的模态互注意力。 mutual attention large language model foundation model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
20 CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework CMMLoc:基于柯西混合模型的文本到点云定位框架 spatial relationship

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 Monocular Person Localization under Camera Ego-motion 提出基于单目相机运动的四点人体模型定位方法,提升人机交互中定位精度 quadruped

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
22 mmDEAR: mmWave Point Cloud Density Enhancement for Accurate Human Body Reconstruction 提出mmDEAR框架,增强毫米波点云密度,提升人体重建精度 SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页