cs.CV（2025-01-07）

📊 共 23 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (7 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (4) 支柱一：机器人控制 (Robot Control) (2) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token	LLaVA-Mini：通过单视觉Token实现高效的图像和视频大模型	large language model multimodal
2	MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation	MM-GEN：通过有针对性的多模态数据生成提升特定任务的视觉-语言模型性能	multimodal	✅
3	MADation: Face Morphing Attack Detection with Foundation Models	提出MADation，利用Foundation Model进行人脸融合攻击检测，性能优于现有方法。	foundation model	✅
4	Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT	提出基于知识图谱和图注意力网络的暴力行为检测、检索与解释统一系统	large language model multimodal
5	Bridged Semantic Alignment for Zero-shot 3D Medical Image Diagnosis	提出桥接语义对齐框架BrgSA，用于零样本3D医学图像诊断。	VLA large language model
6	Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos	Sa2VA：融合SAM2与LLaVA，实现图像和视频的密集型Grounded理解	large language model
7	Vision Language Models as Values Detectors	探索视觉语言模型作为价值观检测器的潜力，应用于家庭环境理解	large language model
8	SMIR: Efficient Synthetic Data Pipeline To Improve Multi-Image Reasoning	SMIR：高效合成数据流水线提升多图推理能力	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
9	ConcealGS: Concealing Invisible Copyright Information in 3D Gaussian Splatting	ConcealGS：提出一种在3D高斯溅射中隐藏不可见版权信息的方法	distillation 3D gaussian splatting gaussian splatting
10	CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds	CL3DOR：通过高分辨率点云上的优势比对比学习提升3D大型多模态模型	contrastive learning scene understanding large language model
11	NeuralSVG: An Implicit Representation for Text-to-Vector Generation	NeuralSVG：提出一种基于隐式表达的文本到矢量图形生成方法，提升结构化和灵活性的SVG生成效果。	distillation NeRF neural radiance field
12	LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving	LargeAD：面向自动驾驶的大规模跨传感器数据预训练框架	representation learning contrastive learning scene understanding
13	Cosmos World Foundation Model Platform for Physical AI	NVIDIA 提出 Cosmos 世界基础模型平台，助力物理人工智能构建定制化世界模型	world model foundation model	✅
14	Information-Maximized Soft Variable Discretization for Self-Supervised Image Representation Learning	提出信息最大化软变量离散化(IMSVD)的自监督图像表征学习方法	representation learning contrastive learning foundation model	✅
15	An Empirical Study of Accuracy-Robustness Tradeoff and Training Efficiency in Self-Supervised Learning	提出CF-AMC-SSL，在自监督学习中加速收敛，提升精度和鲁棒性的平衡。	representation learning contrastive learning	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
16	DehazeGS: Seeing Through Fog with 3D Gaussian Splatting	提出DehazeGS，利用3D高斯溅射实现雾天图像的去雾和高质量新视角合成。	3D gaussian splatting gaussian splatting splatting
17	MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting	MoDec-GS：面向复杂动态场景，提出全局到局部运动分解的紧凑型动态3D高斯溅射方法	3D gaussian splatting 3DGS gaussian splatting
18	NeRFs are Mirror Detectors: Using Structural Similarity for Multi-View Mirror Scene Reconstruction with 3D Surface Primitives	NeRF-MD：利用结构相似性进行多视角镜面场景三维表面重建	NeRF neural radiance field scene reconstruction
19	ZDySS -- Zero-Shot Dynamic Scene Stylization using Gaussian Splatting	提出ZDySS，利用高斯溅射实现动态场景的零样本风格迁移	gaussian splatting splatting

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets	提出基于数字孪生和Sim2Real的森林三维结构合成框架，并构建了大规模森林点云数据集Boreal3D。	sim2real scene understanding
21	Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control	DaS：利用3D感知视频扩散模型实现多功能视频生成控制	manipulation

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition	提出基于图学习的多模态多视角对齐框架，用于提升第一人称视角视频中的关键步骤识别精度。	egocentric multimodal

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Extraction Of Cumulative Blobs From Dynamic Gestures	提出基于夜视摄像头的动态手势识别方法，解决光照不足环境下的手势交互问题	human motion

⬅️ 返回 cs.CV 首页 · 🏠 返回主页