cs.CV(2024-11-19)

📊 共 27 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (8 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (8) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
1 Gradient-Weighted Feature Back-Projection: A Fast Alternative to Feature Distillation in 3D Gaussian Splatting 提出一种基于梯度加权特征反投影的3D高斯溅射快速特征场渲染方法 distillation 3D gaussian splatting gaussian splatting
2 GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving 提出GaussianPretrain,通过统一3D高斯表示实现自动驾驶视觉预训练。 visual pre-training NeRF scene understanding
3 Towards motion from video diffusion models 利用视频扩散模型和SDS,引导SMPL-X人体模型生成逼真动画 distillation SMPL SMPL-X
4 KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder 提出KDC-MAE,融合对比学习、知识蒸馏和掩码自编码器,提升自监督学习表征能力。 MAE contrastive learning distillation
5 DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes 提出DGTR:用于稀疏视图大场景的高效分布式高斯重建方法 distillation scene reconstruction geometric consistency
6 Data-to-Model Distillation: Data-Efficient Learning Framework 提出数据到模型蒸馏(D2M)框架,实现高效、可扩展的数据集蒸馏。 distillation
7 What Makes a Good Dataset for Knowledge Distillation? 研究知识蒸馏中数据集选择问题,揭示非真实数据也可有效传递知识 distillation
8 CLIC: Contrastive Learning Framework for Unsupervised Image Complexity Representation 提出CLIC:一种基于对比学习的无监督图像复杂度表征框架 contrastive learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
9 Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting 提出Sim Anything以解决3D场景物理模拟效率问题 gaussian splatting splatting scene reconstruction
10 Sketch-guided Cage-based 3D Gaussian Splatting Deformation 提出草图引导的基于笼的3D高斯溅射变形方法,实现精细几何编辑与动画 3D gaussian splatting gaussian splatting splatting
11 PR-ENDO: Physically Based Relightable Gaussian Splatting for Endoscopy PR-ENDO:基于物理的内窥镜可重光照高斯溅射重建 3D gaussian splatting gaussian splatting splatting
12 SCIGS: 3D Gaussians Splatting from a Snapshot Compressive Image SCIGS:从单幅压缩图像重建动态场景的3D高斯溅射方法 3DGS splatting NeRF
13 Beyond Gaussians: Fast and High-Fidelity 3D Splatting with Linear Kernels 提出3D线性溅射(3DLS),用线性核替代高斯核,提升新视角合成的质量和速度。 3D gaussian splatting 3DGS gaussian splatting
14 Mini-Splatting2: Building 360 Scenes within Minutes via Aggressive Gaussian Densification Mini-Splatting2:通过激进高斯致密化,在数分钟内构建360度场景。 gaussian splatting splatting
15 Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images 提出Maps from Motion (MfM),从稀疏多视角图像生成2D语义地图。 semantic map
16 MTFusion: Reconstructing Any 3D Object from Single Image Using Multi-word Textual Inversion MTFusion:利用多词文本反演从单张图像重建任意3D物体 neural radiance field

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
17 Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models 提出MSCKE框架,解决多模态大语言模型中视觉导向的细粒度知识编辑问题 large language model multimodal
18 Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model Med-2E3:一种2D增强的3D医学多模态大语言模型,提升3D医学图像分析性能。 large language model multimodal
19 Large Language Models for Lossless Image Compression: Next-Pixel Prediction in Language Space is All You Need 提出P²-LLM,利用大语言模型实现高性能无损图像压缩 large language model
20 From Holistic to Localized: Local Enhanced Adapters for Efficient Visual Instruction Fine-Tuning 提出局部增强适配器,通过双重结构优化提升视觉指令微调效率。 large language model multimodal
21 HouseTune: Two-Stage Floorplan Generation with LLM Assistance HouseTune:提出一种结合LLM与扩散模型的两阶段户型图生成框架。 large language model chain-of-thought
22 Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment 提出一种免训练的去偏框架,提升大模型在图像质量评估任务上的性能 multimodal
23 Generative Timelines for Instructed Visual Assembly 提出Timeline Assembler,通过自然语言指令生成式编辑视觉时间线 multimodal
24 Neuro-3D: Towards 3D Visual Decoding from EEG Signals 提出Neuro-3D框架,实现基于脑电信号的3D视觉解码 multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
25 HyperGAN-CLIP: A Unified Framework for Domain Adaptation, Image Synthesis and Manipulation HyperGAN-CLIP:统一框架实现域自适应、图像合成与操控 manipulation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
26 VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference VioPose:利用分层视听推理进行小提琴演奏的4D姿态估计 human-object interaction multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
27 IoT-Based 3D Pose Estimation and Motion Optimization for Athletes: Application of C3D and OpenPose 提出IE-PONet以解决田径运动员的3D姿态估计与动作优化问题 spatiotemporal multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页