cs.CV(2026-01-23)

📊 共 19 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一:机器人控制 (Robot Control) (3 🔗2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning 提出TangramPuzzle基准,评估多模态大语言模型在组合空间推理上的能力。 large language model multimodal
2 OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding 提出OnlineSI框架,利用大语言模型实现持续在线的3D场景理解与定位 large language model multimodal
3 Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding Emotion-LLaMAv2:多模态情感理解的端到端框架与基准 large language model multimodal
4 VISTA-PATH: An interactive foundation model for pathology image segmentation and quantitative analysis in computational pathology VISTA-PATH:用于病理图像分割和定量分析的交互式基础模型 foundation model
5 Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos 利用游戏视频中的故障,构建物理世界理解数据集PhysGame和基准GameBench。 large language model multimodal
6 ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation ResAgent:提出基于熵的先验点发现和视觉推理方法,用于指代表达式分割。 large language model multimodal
7 X-Aligner: Composed Visual Retrieval without the Bells and Whistles 提出X-Aligner,用于组合视频检索,无需复杂设计即可达到SOTA multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
8 A Step to Decouple Optimization in 3DGS 解耦3DGS优化:提出AdamW-GS,提升优化效率与表达能力 3D gaussian splatting 3DGS gaussian splatting
9 GPA-VGGT:Adapting VGGT to Large scale Localization by self-Supervised learning with Geometry and Physics Aware loss 提出基于几何与物理感知损失自监督学习的GPA-VGGT,提升大规模定位能力。 scene understanding VGGT
10 AnyView: Synthesizing Any Novel View in Dynamic Scenes AnyView:提出一种基于扩散模型的动态场景任意视角合成框架 implicit representation spatiotemporal
11 AnchoredDream: Zero-Shot 360° Indoor Scene Generation from a Single View via Geometric Grounding AnchoredDream:基于几何约束的单视图零样本360°室内场景生成 depth estimation
12 Multi-View Consistent Wound Segmentation With Neural Fields 提出WoundNeRF,用于多视角一致的伤口分割,提升3D重建精度。 NeRF

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
13 Incorporating Eye-Tracking Signals Into Multimodal Deep Visual Models For Predicting User Aesthetic Experience In Residential Interiors 提出融合眼动信号的双分支CNN-LSTM模型,用于预测住宅室内设计的美学体验 privileged information multimodal
14 PanopMamba: Vision State Space Modeling for Nuclei Panoptic Segmentation PanopMamba:用于细胞核全景分割的视觉状态空间建模 Mamba SSM state space model
15 Flow Matching for Probabilistic Monocular 3D Human Pose Estimation FMPose:基于流匹配的单目3D人体姿态概率估计 flow matching
16 SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer 提出SALAD以解决视频生成中的高计算复杂度问题 linear attention

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
17 Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss 提出结构保持损失的扩散模型,用于边缘感知图像编辑 manipulation structure preservation
18 VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents VisGym:用于多模态智能体的多样化、可定制、可扩展的环境 manipulation multimodal
19 ReWeaver: Towards Simulation-Ready and Topology-Accurate Garment Reconstruction ReWeaver:提出一种拓扑精确的服装重建框架,适用于物理仿真。 manipulation sim-to-real

⬅️ 返回 cs.CV 首页 · 🏠 返回主页