cs.CV(2026-01-19)

📊 共 29 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗2) 支柱一:机器人控制 (Robot Control) (4 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic MultiST:基于交叉注意力的多模态模型,用于空间转录组学分析 multimodal
2 From 100,000+ images to winning the first brain MRI foundation model challenges: Sharing lessons and models 提出基于U-Net CNN的脑MRI分析方法,在SSL3D和FOMO25挑战赛中获胜 foundation model
3 Early Prediction of Type 2 Diabetes Using Multimodal data and Tabular Transformers 提出基于Tabular Transformer的T2DM早期预测模型,利用多模态数据提升预测精度。 multimodal
4 Enginuity: Building an Open Multi-Domain Dataset of Complex Engineering Diagrams Enginuity:构建大规模开放多领域工程图数据集,促进图解析与AI辅助工程 large language model multimodal
5 DC-VLAQ: Query-Residual Aggregation for Robust Visual Place Recognition 提出DC-VLAQ,通过查询残差聚合实现鲁棒的视觉定位识别 foundation model
6 GTPred: Benchmarking MLLMs for Interpretable Geo-localization and Time-of-capture Prediction GTPred:用于可解释地理定位和时间预测的多模态大语言模型基准测试 large language model
7 TVWorld: Foundations for Remote-Control TV Agents 提出TVWorld以解决远程控制电视代理的导航问题 foundation model
8 Earth Embeddings as Products: Taxonomy, Ecosystem, and Standardized Access 提出地球嵌入产品分类体系与标准化API,促进地理空间基础模型应用 foundation model
9 Fusing in 3D: Free-Viewpoint Fusion Rendering with a 3D Infrared-Visible Scene Representation 提出基于3D红外-可见光场景表示的自由视角融合渲染方法 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
10 GaussExplorer: 3D Gaussian Splatting for Embodied Exploration and Reasoning GaussExplorer:基于3D高斯溅射的具身探索与推理框架 3D gaussian splatting 3DGS gaussian splatting
11 TreeDGS: Aerial Gaussian Splatting for Distant DBH Measurement TreeDGS:用于远距离胸径测量的空中高斯溅射方法 3D gaussian splatting gaussian splatting splatting
12 Open Vocabulary Panoptic Segmentation With Retrieval Augmentation RetCLIP:提出检索增强的开放词汇全景分割方法,提升未见类别的分割性能。 open-vocabulary open vocabulary
13 GaussianTrimmer: Online Trimming Boundaries for 3DGS Segmentation GaussianTrimmer:提出在线边界修剪方法,提升3D高斯分割的边界质量。 3DGS
14 KaoLRM: Repurposing Pre-trained Large Reconstruction Models for Parametric 3D Face Reconstruction KaoLRM:利用预训练大型重建模型进行参数化3D人脸重建 gaussian splatting splatting
15 ICo3D: An Interactive Conversational 3D Virtual Human ICo3D:提出交互式对话3D虚拟人生成方法,实现逼真实时互动 splatting
16 Moaw: Unleashing Motion Awareness for Video Diffusion Models Moaw:释放视频扩散模型的运动感知能力,实现零样本运动迁移 optical flow
17 Near-Light Color Photometric Stereo for mono-Chromaticity non-lambertian surface 提出单色性非朗伯体近光彩色立体光度方法,实现单图高精度表面重建 implicit representation

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
18 A Generalist Foundation Model for Total-body PET/CT Enables Diagnostic Reporting and System-wide Metabolic Profiling SDF-HOLO:用于全身PET/CT的通用基础模型,实现诊断报告和系统级代谢分析 representation learning foundation model multimodal
19 CausalSpatial: A Benchmark for Object-Centric Causal Spatial Reasoning 提出CausalSpatial基准测试,评估多模态大语言模型在因果空间推理中的能力 world model large language model multimodal
20 Towards Unbiased Source-Free Object Detection via Vision Foundation Models 提出DSOD框架,利用视觉基础模型解决无源域目标检测中的源域偏差问题 distillation foundation model
21 ConvMambaNet: A Hybrid CNN-Mamba State Space Architecture for Accurate and Real-Time EEG Seizure Detection ConvMambaNet:一种用于精确、实时脑电癫痫检测的混合CNN-Mamba状态空间架构 Mamba SSM state space model
22 Think3D: Thinking with Space for Spatial Reasoning Think3D:利用空间推理增强视觉大模型在空间理解上的能力 reinforcement learning multimodal chain-of-thought

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
23 CSGaussian: Progressive Rate-Distortion Compression and Segmentation for 3D Gaussian Splatting CSGaussian:用于3D高斯溅射的渐进式率失真压缩与分割统一框架 manipulation 3D gaussian splatting 3DGS
24 Spatial-VLN: Zero-Shot Vision-and-Language Navigation With Explicit Spatial Perception and Exploration Spatial-VLN:利用显式空间感知和探索实现零样本视觉语言导航 sim2real VLN large language model
25 TwoHead-SwinFPN: A Unified DL Architecture for Synthetic Manipulation, Detection and Localization in Identity Documents TwoHead-SwinFPN:用于身份证件合成篡改检测与定位的统一深度学习架构 manipulation
26 Exploring Talking Head Models With Adjacent Frame Prior for Speech-Preserving Facial Expression Manipulation 提出THFEM框架,结合语音驱动头部生成模型与表情操控,提升唇形同步精度 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
27 A Semantic Decoupling-Based Two-Stage Rainy-Day Attack for Revealing Weather Robustness Deficiencies in Vision-Language Models 提出基于语义解耦的两阶段雨天攻击框架,揭示视觉-语言模型的天气鲁棒性缺陷。 physically plausible multimodal

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
28 Dual-Stream Collaborative Transformer for Image Captioning 提出双流协同Transformer (DSCT) 用于解决图像描述生成中上下文信息不足的问题。 mutual attention

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
29 Deep Learning for Semantic Segmentation of 3D Ultrasound Data 提出基于3D U-Net的3D超声数据语义分割框架,用于恶劣环境下的自动驾驶。 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页