cs.CV(2026-01-05)

📊 共 26 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (5) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 360-GeoGS: Geometrically Consistent Feed-Forward 3D Gaussian Splatting Reconstruction for 360 Images 提出360-GeoGS,用于360图像的几何一致性前馈3D高斯溅射重建 3D gaussian splatting 3DGS gaussian splatting
2 ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting ESGaussianFace:利用3D高斯溅射实现情感化和风格化的音频驱动面部动画 3D gaussian splatting gaussian splatting splatting
3 Adapting Depth Anything to Adverse Imaging Conditions with Events ADAE:利用事件相机,增强Depth Anything在恶劣成像条件下的深度估计能力 depth estimation Depth Anything spatiotemporal
4 Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding 利用2D-VLM实现大规模室外场景中无标签3D分割 scene understanding open-vocabulary open vocabulary
5 360DVO: Deep Visual Odometry for Monocular 360-Degree Camera 提出360DVO,一种基于深度学习的单目全景相机视觉里程计框架 visual odometry
6 InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting InpaintHuman:提出多尺度UV映射与保身份扩散修复,重建遮挡人体化身 3D gaussian splatting gaussian splatting splatting
7 Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding 提出基于各向异性局部编码的3D高斯模型联合语义与渲染增强方法 3DGS
8 InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams 提出InfiniteVGGT以解决长时间3D视觉几何理解问题 VGGT

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
9 SLGNet: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection SLGNet:融合结构先验与语言引导的多模态目标检测,提升全天候场景鲁棒性。 foundation model multimodal
10 Mind the Gap: Continuous Magnification Sampling for Pathology Foundation Models 提出连续放大倍率采样,提升病理学Foundation Model在各放大倍率下的性能 foundation model
11 VINO: A Unified Visual Generator with Interleaved OmniModal Context VINO:一种统一的视觉生成器,通过交错全模态上下文实现图像和视频的生成与编辑。 multimodal instruction following
12 CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving CogFlow:通过知识内化桥接感知与推理,解决视觉数学问题 large language model multimodal
13 Causality-Aware Temporal Projection for Video Understanding in Video-LLMs V-CORE:面向视频理解,在Video-LLM中引入因果感知的时序投影 large language model multimodal
14 Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping 提出Prithvi-CAFE以解决洪水淹没映射中的局部细节捕捉问题 foundation model
15 BiPrompt: Bilateral Prompt Optimization for Visual and Textual Debiasing in Vision-Language Models BiPrompt:双边Prompt优化,用于视觉-语言模型中的视觉和文本去偏。 foundation model
16 AR-MOT: Autoregressive Multi-object Tracking 提出AR-MOT:一种基于自回归的大语言模型多目标跟踪框架,实现更灵活的任务泛化。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
17 NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation NextFlow:统一序列建模激活多模态理解与生成能力 reinforcement learning multimodal
18 Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems 综述性论文:遥感领域Agentic AI的基础、分类与新兴系统研究 representation learning large language model foundation model
19 HeadLighter: Disentangling Illumination in Generative 3D Gaussian Heads via Lightstage Captures HeadLighter:通过光场捕捉解耦生成式3D高斯头部中的光照 distillation 3D gaussian splatting gaussian splatting
20 Enhancing Object Detection with Privileged Information: A Model-Agnostic Teacher-Student Approach 提出利用特权信息的教师-学生方法以提升目标检测性能 teacher-student privileged information
21 Point-SRA: Self-Representation Alignment for 3D Representation Learning Point-SRA:通过自表示对齐进行3D表示学习 representation learning masked autoencoder MAE

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
22 Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes 提出Talk2Move以解决文本指令下的对象几何变换问题 manipulation reinforcement learning multimodal
23 TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing 提出TalkPhoto,一种无需训练的通用对话式图像编辑助手 manipulation large language model multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
24 Robust Egocentric Visual Attention Prediction Through Language-guided Scene Context-aware Learning 提出语言引导的场景上下文感知学习框架,提升第一视角视觉注意力预测的鲁棒性 egocentric Ego4D
25 DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies DiffProxy:利用扩散模型生成稠密代理的多视角人体网格重建 human mesh recovery

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
26 MagicFight: Personalized Martial Arts Combat Video Generation MagicFight:提出个性化武术格斗视频生成方法,填补双人互动视频生成空白。 two-person interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页