cs.CV（2026-01-23）

📊 共 19 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一：机器人控制 (Robot Control) (3 🔗2)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	TangramPuzzle: Evaluating Multimodal Large Language Models with Compositional Spatial Reasoning	提出TangramPuzzle基准，评估多模态大语言模型在组合空间推理上的能力。	large language model multimodal
2	OnlineSI: Taming Large Language Model for Online 3D Understanding and Grounding	提出OnlineSI框架，利用大语言模型实现持续在线的3D场景理解与定位	large language model multimodal
3	Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding	Emotion-LLaMAv2：多模态情感理解的端到端框架与基准	large language model multimodal
4	VISTA-PATH: An interactive foundation model for pathology image segmentation and quantitative analysis in computational pathology	VISTA-PATH：用于病理图像分割和定量分析的交互式基础模型	foundation model	✅
5	Order from Chaos: Physical World Understanding from Glitchy Gameplay Videos	利用游戏视频中的故障，构建物理世界理解数据集PhysGame和基准GameBench。	large language model multimodal
6	ResAgent: Entropy-based Prior Point Discovery and Visual Reasoning for Referring Expression Segmentation	ResAgent：提出基于熵的先验点发现和视觉推理方法，用于指代表达式分割。	large language model multimodal
7	X-Aligner: Composed Visual Retrieval without the Bells and Whistles	提出X-Aligner，用于组合视频检索，无需复杂设计即可达到SOTA	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
8	A Step to Decouple Optimization in 3DGS	解耦3DGS优化：提出AdamW-GS，提升优化效率与表达能力	3D gaussian splatting 3DGS gaussian splatting
9	GPA-VGGT:Adapting VGGT to Large scale Localization by self-Supervised learning with Geometry and Physics Aware loss	提出基于几何与物理感知损失自监督学习的GPA-VGGT，提升大规模定位能力。	scene understanding VGGT	✅
10	AnyView: Synthesizing Any Novel View in Dynamic Scenes	AnyView：提出一种基于扩散模型的动态场景任意视角合成框架	implicit representation spatiotemporal	✅
11	AnchoredDream: Zero-Shot 360° Indoor Scene Generation from a Single View via Geometric Grounding	AnchoredDream：基于几何约束的单视图零样本360°室内场景生成	depth estimation
12	Multi-View Consistent Wound Segmentation With Neural Fields	提出WoundNeRF，用于多视角一致的伤口分割，提升3D重建精度。	NeRF

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Incorporating Eye-Tracking Signals Into Multimodal Deep Visual Models For Predicting User Aesthetic Experience In Residential Interiors	提出融合眼动信号的双分支CNN-LSTM模型，用于预测住宅室内设计的美学体验	privileged information multimodal
14	PanopMamba: Vision State Space Modeling for Nuclei Panoptic Segmentation	PanopMamba：用于细胞核全景分割的视觉状态空间建模	Mamba SSM state space model	✅
15	Flow Matching for Probabilistic Monocular 3D Human Pose Estimation	FMPose：基于流匹配的单目3D人体姿态概率估计	flow matching
16	SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer	提出SALAD以解决视频生成中的高计算复杂度问题	linear attention

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Edge-Aware Image Manipulation via Diffusion Models with a Novel Structure-Preservation Loss	提出结构保持损失的扩散模型，用于边缘感知图像编辑	manipulation structure preservation	✅
18	VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents	VisGym：用于多模态智能体的多样化、可定制、可扩展的环境	manipulation multimodal	✅
19	ReWeaver: Towards Simulation-Ready and Topology-Accurate Garment Reconstruction	ReWeaver：提出一种拓扑精确的服装重建框架，适用于物理仿真。	manipulation sim-to-real

⬅️ 返回 cs.CV 首页 · 🏠 返回主页