cs.CV（2026-01-05）

📊 共 26 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (5) 支柱一：机器人控制 (Robot Control) (2) 支柱六：视频提取与匹配 (Video Extraction) (2 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	360-GeoGS: Geometrically Consistent Feed-Forward 3D Gaussian Splatting Reconstruction for 360 Images	提出360-GeoGS，用于360图像的几何一致性前馈3D高斯溅射重建	3D gaussian splatting 3DGS gaussian splatting
2	ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting	ESGaussianFace：利用3D高斯溅射实现情感化和风格化的音频驱动面部动画	3D gaussian splatting gaussian splatting splatting
3	Adapting Depth Anything to Adverse Imaging Conditions with Events	ADAE：利用事件相机，增强Depth Anything在恶劣成像条件下的深度估计能力	depth estimation Depth Anything spatiotemporal
4	Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding	利用2D-VLM实现大规模室外场景中无标签3D分割	scene understanding open-vocabulary open vocabulary
5	360DVO: Deep Visual Odometry for Monocular 360-Degree Camera	提出360DVO，一种基于深度学习的单目全景相机视觉里程计框架	visual odometry	✅
6	InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting	InpaintHuman：提出多尺度UV映射与保身份扩散修复，重建遮挡人体化身	3D gaussian splatting gaussian splatting splatting
7	Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding	提出基于各向异性局部编码的3D高斯模型联合语义与渲染增强方法	3DGS
8	InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams	提出InfiniteVGGT以解决长时间3D视觉几何理解问题	VGGT	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
9	SLGNet: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection	SLGNet：融合结构先验与语言引导的多模态目标检测，提升全天候场景鲁棒性。	foundation model multimodal
10	Mind the Gap: Continuous Magnification Sampling for Pathology Foundation Models	提出连续放大倍率采样，提升病理学Foundation Model在各放大倍率下的性能	foundation model
11	VINO: A Unified Visual Generator with Interleaved OmniModal Context	VINO：一种统一的视觉生成器，通过交错全模态上下文实现图像和视频的生成与编辑。	multimodal instruction following
12	CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving	CogFlow：通过知识内化桥接感知与推理，解决视觉数学问题	large language model multimodal
13	Causality-Aware Temporal Projection for Video Understanding in Video-LLMs	V-CORE：面向视频理解，在Video-LLM中引入因果感知的时序投影	large language model multimodal
14	Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping	提出Prithvi-CAFE以解决洪水淹没映射中的局部细节捕捉问题	foundation model	✅
15	BiPrompt: Bilateral Prompt Optimization for Visual and Textual Debiasing in Vision-Language Models	BiPrompt：双边Prompt优化，用于视觉-语言模型中的视觉和文本去偏。	foundation model
16	AR-MOT: Autoregressive Multi-object Tracking	提出AR-MOT：一种基于自回归的大语言模型多目标跟踪框架，实现更灵活的任务泛化。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
17	NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation	NextFlow：统一序列建模激活多模态理解与生成能力	reinforcement learning multimodal
18	Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems	综述性论文：遥感领域Agentic AI的基础、分类与新兴系统研究	representation learning large language model foundation model
19	HeadLighter: Disentangling Illumination in Generative 3D Gaussian Heads via Lightstage Captures	HeadLighter：通过光场捕捉解耦生成式3D高斯头部中的光照	distillation 3D gaussian splatting gaussian splatting
20	Enhancing Object Detection with Privileged Information: A Model-Agnostic Teacher-Student Approach	提出利用特权信息的教师-学生方法以提升目标检测性能	teacher-student privileged information
21	Point-SRA: Self-Representation Alignment for 3D Representation Learning	Point-SRA：通过自表示对齐进行3D表示学习	representation learning masked autoencoder MAE

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes	提出Talk2Move以解决文本指令下的对象几何变换问题	manipulation reinforcement learning multimodal
23	TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing	提出TalkPhoto，一种无需训练的通用对话式图像编辑助手	manipulation large language model multimodal

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
24	Robust Egocentric Visual Attention Prediction Through Language-guided Scene Context-aware Learning	提出语言引导的场景上下文感知学习框架，提升第一视角视觉注意力预测的鲁棒性	egocentric Ego4D
25	DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies	DiffProxy：利用扩散模型生成稠密代理的多视角人体网格重建	human mesh recovery	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
26	MagicFight: Personalized Martial Arts Combat Video Generation	MagicFight：提出个性化武术格斗视频生成方法，填补双人互动视频生成空白。	two-person interaction	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页