cs.CV(2026-01-28)

📊 共 22 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (6) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection GDCNet:生成式差异比较网络用于多模态讽刺检测 large language model multimodal
2 AnomalyVFM -- Transforming Vision Foundation Models into Zero-Shot Anomaly Detectors 提出AnomalyVFM以解决零样本异常检测问题 foundation model
3 Automated Marine Biofouling Assessment: Benchmarking Computer Vision and Multimodal LLMs on the Level of Fouling Scale 利用计算机视觉和多模态LLM自动评估船舶生物污损程度 multimodal
4 Context Tokens are Anchors: Understanding the Repetition Curse in dMLLMs from an Information Flow Perspective 提出CoTA,通过增强上下文token信息流缓解dMLLM中的重复生成问题。 large language model multimodal
5 A New Dataset and Framework for Robust Road Surface Classification via Camera-IMU Fusion 提出基于相机-IMU融合的道路表面稳健分类框架与数据集ROAD multimodal
6 StructAlign: Structured Cross-Modal Alignment for Continual Text-to-Video Retrieval StructAlign:面向持续文本-视频检索的结构化跨模态对齐方法 multimodal
7 Hallucination Begins Where Saliency Drops 提出LVLMs-Saliency框架,通过显著性引导降低大型视觉语言模型中的幻觉问题。 visual grounding
8 Efficient Token Pruning for LLaDA-V 针对LLaDA-V,提出一种高效的Token剪枝策略,显著降低计算成本。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
9 MARE: Multimodal Alignment and Reinforcement for Explainable Deepfake Detection via Vision-Language Models 提出MARE,通过多模态对齐与强化学习,实现可解释的Deepfake检测。 reinforcement learning RLHF multimodal
10 Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification 统一视觉编码与视觉Token技术,为多模态大模型及具身智能提供高效压缩方案 representation learning embodied AI large language model
11 MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis MMSF:用于WSI分类和生存分析的多任务多模态监督框架 Mamba multimodal
12 Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework 提出LEAF框架,解耦感知与校准,实现标签高效的图像质量评估 distillation large language model multimodal
13 Advancing Open-source World Models LingBot-World:开源高保真、长时记忆、实时交互的世界模型 world model
14 RAW-Flow: Advancing RGB-to-RAW Image Reconstruction with Deterministic Latent Flow Matching 提出RAW-Flow,通过确定性隐空间流匹配实现高质量RGB到RAW图像重建 flow matching

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
15 Open-Vocabulary Functional 3D Human-Scene Interaction Generation 提出FunHSI框架,实现开放词汇的功能性3D人-场景交互生成 open-vocabulary open vocabulary physically plausible
16 FreeFix: Boosting 3D Gaussian Splatting via Fine-Tuning-Free Diffusion Models FreeFix:通过免微调扩散模型提升3D高斯溅射渲染质量 3D gaussian splatting gaussian splatting splatting
17 GVGS: Gaussian Visibility-Aware Multi-View Geometry for Accurate Surface Reconstruction GVGS:高斯可见性感知多视图几何,用于精确表面重建 monocular depth 3D gaussian splatting gaussian splatting
18 Physically Guided Visual Mass Estimation from a Single RGB Image 提出一种物理引导的单RGB图像物体质量估计框架,提升质量预测精度。 depth estimation monocular depth

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
19 CURVE: Learning Causality-Inspired Invariant Representations for Robust Scene Understanding via Uncertainty-Guided Regularization 提出CURVE框架,通过不确定性引导的正则化学习因果不变表示,提升场景理解的鲁棒性。 sim-to-real scene understanding zero-shot transfer
20 Quartet of Diffusions: Structure-Aware Point Cloud Generation through Part and Symmetry Guidance 提出 Quartet of Diffusions,通过部件和对称性引导实现结构感知的点云生成。 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
21 HINT: Hierarchical Interaction Modeling for Autoregressive Multi-Human Motion Generation HINT:用于自回归多人运动生成的层级交互建模框架 motion generation human motion

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
22 Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models 提出SpatialGenEval基准与SpatialT2I数据集,提升文本生成图像模型空间智能 spatial relationship foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页