cs.CV(2026-02-22)

📊 共 24 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 CREM: Compression-Driven Representation Enhancement for Multimodal Retrieval and Comprehension CREM:通过压缩驱动的表征增强,统一多模态检索与理解任务。 large language model multimodal
2 UniE2F: A Unified Diffusion Framework for Event-to-Frame Reconstruction with Video Foundation Models UniE2F:利用视频基础模型的事件到帧重建统一扩散框架 foundation model
3 EMAD: Evidence-Centric Grounded Multimodal Diagnosis for Alzheimer's Disease EMAD:面向阿尔茨海默病的证据驱动多模态诊断框架 multimodal
4 CaReFlow: Cyclic Adaptive Rectified Flow for Multimodal Fusion 提出CaReFlow:循环自适应修正流,用于解决多模态融合中的模态差异问题。 multimodal
5 Direction-aware 3D Large Multimodal Models 提出方向感知的3D大规模多模态模型以解决缺乏自我姿态的问题 multimodal
6 An interpretable framework using foundation models for fish sex identification 提出FishProtoNet,利用原型网络和基础模型实现对濒危鱼类三角小银鱼的非侵入式性别鉴定。 foundation model
7 A Benchmark and Knowledge-Grounded Framework for Advanced Multimodal Personalization Study 提出Life-Bench多模态基准测试和LifeGraph知识图谱框架,用于提升高级个性化研究。 multimodal
8 No Need For Real Anomaly: MLLM Empowered Zero-Shot Video Anomaly Detection 提出LAVIDA,利用MLLM赋能零样本视频异常检测,无需真实异常数据。 large language model multimodal
9 Adaptive Data Augmentation with Multi-armed Bandit: Sample-Efficient Embedding Calibration for Implicit Pattern Recognition ADAMAB:基于多臂赌博机自适应数据增强的少样本嵌入校准框架 foundation model
10 VIGiA: Instructional Video Guidance via Dialogue Reasoning and Retrieval VIGiA:通过对话推理和检索进行教学视频指导 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
11 Universal 3D Shape Matching via Coarse-to-Fine Language Guidance 提出UniMatch,通过粗到细的语言引导实现通用3D形状匹配。 contrastive learning large language model multimodal
12 GUIDE-US: Grade-Informed Unpaired Distillation of Encoder Knowledge from Histopathology to Micro-UltraSound 提出GUIDE-US,利用非配对组织病理学知识蒸馏提升微超声前列腺癌分级性能。 distillation foundation model
13 MRI Contrast Enhancement Kinetics World Model 提出MRI CEKWorld模型,通过时空一致性学习提升MRI对比增强动态模拟效果 world model spatiotemporal
14 GS-CLIP: Zero-shot 3D Anomaly Detection by Geometry-Aware Prompt and Synergistic View Representation Learning GS-CLIP:基于几何感知提示和协同视图表示学习的零样本3D异常检测 representation learning distillation
15 JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation JavisDiT++:提出统一建模与优化框架,用于高质量联合音视频生成。 DPO direct preference optimization multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
16 DefenseSplat: Enhancing the Robustness of 3D Gaussian Splatting via Frequency-Aware Filtering DefenseSplat:通过频率感知滤波增强3D高斯溅射的鲁棒性 3D gaussian splatting 3DGS gaussian splatting
17 OpenVO: Open-World Visual Odometry with Temporal Dynamics Awareness OpenVO:提出一种具有时间动态感知的开放世界视觉里程计框架 visual odometry foundation model
18 TeFlow: Enabling Multi-frame Supervision for Self-Supervised Feed-forward Scene Flow Estimation TeFlow:通过时序一致性监督,提升自监督前馈场景流估计性能 scene flow

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
19 Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction 提出循环一致掩码预测方法,解决跨视角物体对应问题 egocentric
20 Keep it SymPL: Symbolic Projective Layout for Allocentric Spatial Reasoning in Vision-Language Models 提出SymPL框架,解决视觉-语言模型中以客体为中心的空间推理难题 egocentric spatial relationship

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
21 VLM-Guided Group Preference Alignment for Diffusion-based Human Mesh Recovery 提出VLM引导的群体偏好对齐框架,提升扩散模型人体网格重建的真实性和一致性 physically plausible human mesh recovery HMR

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
22 Controlled Face Manipulation and Synthesis for Data Augmentation 提出基于扩散自编码器的可控人脸操纵方法,用于数据增强以提升表情识别性能。 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
23 FUSAR-GPT : A Spatiotemporal Feature-Embedded and Two-Stage Decoupled Visual Language Model for SAR Imagery FUSAR-GPT:面向SAR影像,时空特征嵌入与解耦的两阶段视觉语言模型 spatiotemporal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
24 Ani3DHuman: Photorealistic 3D Human Animation with Self-guided Stochastic Sampling Ani3DHuman:结合运动学与扩散先验的逼真3D人体动画生成 motion representation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页