cs.CV(2026-02-14)

📊 共 18 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱四:生成式动作 (Generative Motion) (2) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱八:物理动画 (Physics-based Animation) (2) 支柱一:机器人控制 (Robot Control) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 Foundation Model-Driven Semantic Change Detection in Remote Sensing Imagery 提出PerASCD,利用遥感基础模型提升语义变化检测性能并简化流程 foundation model
2 A WDLoRA-Based Multimodal Generative Framework for Clinically Guided Corneal Confocal Microscopy Image Synthesis in Diabetic Neuropathy 提出基于WDLoRA的多模态生成框架,用于糖尿病神经病变中临床引导的角膜共聚焦显微镜图像合成。 multimodal
3 KorMedMCQA-V: A Multimodal Benchmark for Evaluating Vision-Language Models on the Korean Medical Licensing Examination 提出KorMedMCQA-V:一个用于评估视觉-语言模型在韩国医学执照考试上的多模态基准。 multimodal
4 OmniScience: A Large-scale Multi-modal Dataset for Scientific Image Understanding 提出OmniScience大规模多模态数据集,提升科学图像理解能力 large language model multimodal
5 LeafNet: A Large-Scale Dataset and Comprehensive Benchmark for Foundational Vision-Language Understanding of Plant Diseases 提出LeafNet大规模植物病害视觉-语言数据集与LeafBench基准,促进农业领域多模态理解。 foundation model multimodal
6 AdaVBoost: Mitigating Hallucinations in LVLMs via Token-Level Adaptive Visual Attention Boosting 提出AdaVBoost以解决LVLM中的幻觉问题 visual grounding

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
7 Skeleton2Stage: Reward-Guided Fine-Tuning for Physically Plausible Dance Generation 提出Skeleton2Stage,通过强化学习微调扩散模型,提升舞蹈生成中物理合理性 reinforcement learning reward design motion synthesis
8 Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings 提出Embed-RL框架以解决多模态嵌入中的推理驱动问题 reinforcement learning large language model multimodal
9 A generalizable foundation model for intraoperative understanding across surgical procedures ZEN:一种通用的术中理解基础模型,可跨多种外科手术泛化 representation learning distillation scene understanding
10 Prior-guided Hierarchical Instance-pixel Contrastive Learning for Ultrasound Speckle Noise Suppression 提出先验引导的分层实例-像素对比学习方法,用于超声散斑噪声抑制。 contrastive learning

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
11 T2MBench: A Benchmark for Out-of-Distribution Text-to-Motion Generation 提出T2MBench基准,用于评估文本到动作生成模型在分布外场景的泛化能力 text-to-motion motion generation
12 VAR-3D: View-aware Auto-Regressive Model for Text-to-3D Generation via a 3D Tokenizer 提出VAR-3D模型,通过视角感知自回归方法提升文本到3D生成的质量和一致性 VQ-VAE

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
13 Nighttime Autonomous Driving Scene Reconstruction with Physically-Based Gaussian Splatting 提出基于物理的高斯点云重建方法以解决夜间自动驾驶场景重建问题 3D gaussian splatting 3DGS gaussian splatting
14 Joint Orientation and Weight Optimization for Robust Watertight Surface Reconstruction via Dirichlet-Regularized Winding Fields 提出DiWR,通过Dirichlet正则化Winding场实现鲁棒的封闭曲面重建 3D gaussian splatting gaussian splatting splatting

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
15 EchoTorrent: Towards Swift, Sustained, and Streaming Multi-Modal Video Generation EchoTorrent:面向快速、稳定和流式多模态视频生成的新框架 spatiotemporal multimodal
16 Low-Pass Filtering Improves Behavioral Alignment of Vision Models 低通滤波显著提升视觉模型与人类视觉行为的一致性 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
17 Gaussian Sequences with Multi-Scale Dynamics for 4D Reconstruction from Monocular Casual Videos 提出基于多尺度动态高斯序列的单目视频4D重建方法 manipulation physically plausible foundation model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
18 RPGD: RANSAC-P3P Gradient Descent for Extrinsic Calibration in 3D Human Pose Estimation 提出RPGD框架,用于3D人体姿态估计中稳健的外参标定 human motion

⬅️ 返回 cs.CV 首页 · 🏠 返回主页