cs.CV（2025-04-27）

📊 共 12 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (4) 支柱九：具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (3) 支柱一：机器人控制 (Robot Control) (1) 支柱四：生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
1	OpenFusion++: An Open-vocabulary Real-time Scene Understanding System	提出OpenFusion++，实现开放词汇实时场景理解，提升3D感知的精度和响应速度。	scene understanding open-vocabulary open vocabulary
2	Rendering Anywhere You See: Renderability Field-guided Gaussian Splatting	提出基于可渲染性场引导的高斯溅射方法，提升场景视角合成的渲染稳定性。	gaussian splatting splatting
3	IM-Portrait: Learning 3D-aware Video Diffusion for Photorealistic Talking Heads from Monocular Videos	提出IM-Portrait，一种基于单目视频的3D感知视频扩散方法，用于生成逼真的说话人头部视频。	NeRF geometric consistency
4	Leveraging Multi-Modal Saliency and Fusion for Gaze Target Detection	提出一种融合多模态显著性和单目深度信息的注视目标检测方法	depth estimation monocular depth

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
5	HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease	HoloDx：融合知识与数据的多模态阿尔茨海默病诊断框架	large language model multimodal
6	MERA: Multimodal and Multiscale Self-Explanatory Model with Considerably Reduced Annotation for Lung Nodule Diagnosis	MERA：一种低标注需求的多模态多尺度自解释肺结节诊断模型	multimodal	✅
7	DeepSPG: Exploring Deep Semantic Prior Guidance for Low-light Image Enhancement with Multimodal Learning	提出DeepSPG以解决低光照图像增强中的语义信息缺失问题	multimodal	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
8	DeepInsert: Early Layer Bypass for Efficient and Performant Multimodal Understanding	DeepInsert：通过早期层旁路提升多模态理解的效率与性能	representation learning multimodal
9	CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis	提出CARL，实现相机无关的光谱图像表征学习，提升跨相机泛化性。	representation learning scene understanding foundation model
10	Learning to Drive from a World Model	提出基于世界模型的端到端自动驾驶学习框架，无需人工规则。	world model

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
11	CapsFake: A Multimodal Capsule Network for Detecting Instruction-Guided Deepfakes	CapsFake：提出多模态胶囊网络，用于检测指令引导的深度伪造图像。	manipulation multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions	生成式AI赋能角色动画：全面综述技术、应用与未来方向	motion synthesis character animation	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页