cs.CV(2025-01-07)

📊 共 23 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token LLaVA-Mini:通过单视觉Token实现高效的图像和视频大模型 large language model multimodal
2 MM-GEN: Enhancing Task Performance Through Targeted Multimodal Data Curation MM-GEN:通过有针对性的多模态数据生成提升特定任务的视觉-语言模型性能 multimodal
3 MADation: Face Morphing Attack Detection with Foundation Models 提出MADation,利用Foundation Model进行人脸融合攻击检测,性能优于现有方法。 foundation model
4 Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT 提出基于知识图谱和图注意力网络的暴力行为检测、检索与解释统一系统 large language model multimodal
5 Bridged Semantic Alignment for Zero-shot 3D Medical Image Diagnosis 提出桥接语义对齐框架BrgSA,用于零样本3D医学图像诊断。 VLA large language model
6 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Sa2VA:融合SAM2与LLaVA,实现图像和视频的密集型Grounded理解 large language model
7 Vision Language Models as Values Detectors 探索视觉语言模型作为价值观检测器的潜力,应用于家庭环境理解 large language model
8 SMIR: Efficient Synthetic Data Pipeline To Improve Multi-Image Reasoning SMIR:高效合成数据流水线提升多图推理能力 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
9 ConcealGS: Concealing Invisible Copyright Information in 3D Gaussian Splatting ConcealGS:提出一种在3D高斯溅射中隐藏不可见版权信息的方法 distillation 3D gaussian splatting gaussian splatting
10 CL3DOR: Contrastive Learning for 3D Large Multimodal Models via Odds Ratio on High-Resolution Point Clouds CL3DOR:通过高分辨率点云上的优势比对比学习提升3D大型多模态模型 contrastive learning scene understanding large language model
11 NeuralSVG: An Implicit Representation for Text-to-Vector Generation NeuralSVG:提出一种基于隐式表达的文本到矢量图形生成方法,提升结构化和灵活性的SVG生成效果。 distillation NeRF neural radiance field
12 LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving LargeAD:面向自动驾驶的大规模跨传感器数据预训练框架 representation learning contrastive learning scene understanding
13 Cosmos World Foundation Model Platform for Physical AI NVIDIA 提出 Cosmos 世界基础模型平台,助力物理人工智能构建定制化世界模型 world model foundation model
14 Information-Maximized Soft Variable Discretization for Self-Supervised Image Representation Learning 提出信息最大化软变量离散化(IMSVD)的自监督图像表征学习方法 representation learning contrastive learning foundation model
15 An Empirical Study of Accuracy-Robustness Tradeoff and Training Efficiency in Self-Supervised Learning 提出CF-AMC-SSL,在自监督学习中加速收敛,提升精度和鲁棒性的平衡。 representation learning contrastive learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
16 DehazeGS: Seeing Through Fog with 3D Gaussian Splatting 提出DehazeGS,利用3D高斯溅射实现雾天图像的去雾和高质量新视角合成。 3D gaussian splatting gaussian splatting splatting
17 MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting MoDec-GS:面向复杂动态场景,提出全局到局部运动分解的紧凑型动态3D高斯溅射方法 3D gaussian splatting 3DGS gaussian splatting
18 NeRFs are Mirror Detectors: Using Structural Similarity for Multi-View Mirror Scene Reconstruction with 3D Surface Primitives NeRF-MD:利用结构相似性进行多视角镜面场景三维表面重建 NeRF neural radiance field scene reconstruction
19 ZDySS -- Zero-Shot Dynamic Scene Stylization using Gaussian Splatting 提出ZDySS,利用高斯溅射实现动态场景的零样本风格迁移 gaussian splatting splatting

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
20 Advancing the Understanding of Fine-Grained 3D Forest Structures using Digital Cousins and Simulation-to-Reality: Methods and Datasets 提出基于数字孪生和Sim2Real的森林三维结构合成框架,并构建了大规模森林点云数据集Boreal3D。 sim2real scene understanding
21 Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control DaS:利用3D感知视频扩散模型实现多功能视频生成控制 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
22 Graph-Based Multimodal and Multi-view Alignment for Keystep Recognition 提出基于图学习的多模态多视角对齐框架,用于提升第一人称视角视频中的关键步骤识别精度。 egocentric multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
23 Extraction Of Cumulative Blobs From Dynamic Gestures 提出基于夜视摄像头的动态手势识别方法,解决光照不足环境下的手势交互问题 human motion

⬅️ 返回 cs.CV 首页 · 🏠 返回主页