cs.CV(2024-04-01)

📊 共 37 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (17 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (8) 支柱二:RL算法与架构 (RL & Architecture) (6) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (17 篇)

#题目一句话要点标签🔗
1 Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting 提出Mirror-3DGS以解决镜面反射建模问题 3D gaussian splatting 3DGS gaussian splatting
2 MM3DGS SLAM: Multi-modal 3D Gaussian Splatting for SLAM Using Vision, Depth, and Inertial Measurements 提出MM3DGS以解决SLAM中的多模态地图表示问题 3D gaussian splatting 3DGS gaussian splatting
3 GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields 提出GOV-NeSF以解决开放词汇3D场景理解的泛化问题 implicit representation scene understanding open-vocabulary
4 SGCNeRF: Few-Shot Neural Rendering via Sparse Geometric Consistency Guidance 提出SGCNeRF以解决稀疏视角下的神经渲染问题 NeRF neural radiance field feature matching
5 OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation 提出OVFoodSeg以解决开放词汇食品图像分割问题 open-vocabulary open vocabulary
6 Open-Vocabulary Object Detectors: Robustness Challenges under Distribution Shifts 评估开放词汇物体检测模型的OOD鲁棒性挑战 open-vocabulary open vocabulary
7 From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models 提出开放词汇场景图生成框架以解决视觉关系概念生成问题 open-vocabulary open vocabulary
8 CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians 提出CityGaussian以解决大规模场景实时渲染问题 3D gaussian splatting 3DGS gaussian splatting
9 Feature Splatting: Language-Driven Physics-Based Scene Synthesis and Editing 提出Feature Splatting以解决动态场景合成与编辑问题 splatting foundation model
10 Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects 提出神经隐式表示以构建未知关节物体的数字双胞胎 3D reconstruction implicit representation
11 HAHA: Highly Articulated Gaussian Human Avatars with Textured Mesh Prior 提出HAHA以解决单目视频生成可动画人类头像问题 gaussian splatting splatting SMPL
12 360+x: A Panoptic Multi-modal Scene Understanding Dataset 提出360+x数据集以解决多视角多模态场景理解问题 scene understanding egocentric
13 BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks 提出BadPart框架以解决像素级回归任务的黑箱对抗攻击问题 depth estimation monocular depth optical flow
14 LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization 提出LoSA以解决长视频动作定位中的内存限制问题 optical flow foundation model
15 Scalable Scene Modeling from Perspective Imaging: Physics-based Appearance and Geometry Inference 提出基于物理的3D场景建模方法以解决深度学习局限性 scene reconstruction
16 StructLDM: Structured Latent Diffusion for 3D Human Generation 提出StructLDM以解决3D人类生成中的结构化表示问题 NeRF
17 Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping 提出Hi-Mapper以增强视觉场景的层次识别能力 scene understanding

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
18 CosmicMan: A Text-to-Image Foundation Model for Humans 提出CosmicMan以解决人类图像生成质量不足问题 foundation model
19 Bridging Remote Sensors with Multisensor Geospatial Foundation Models 提出msGFM以统一多种遥感数据,提升地理空间分析能力 foundation model
20 iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer 提出iMD4GC以解决胃癌多模态数据不完整问题 multimodal
21 Harnessing Large Language Models for Training-free Video Anomaly Detection 提出LAVAD以解决视频异常检测的训练依赖问题 large language model
22 Learning by Correction: Efficient Tuning Task for Zero-Shot Generative Vision-Language Reasoning 提出图像条件下的标题修正以提升零-shot生成模型性能 large language model instruction following
23 Prompt Learning for Oriented Power Transmission Tower Detection in High-Resolution SAR Images 提出P2Det以解决高分辨率SAR图像中电力传输塔检测问题 multimodal
24 LLMs are Good Sign Language Translators 提出SignLLM框架以解决手语翻译问题 large language model
25 LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction 提出LLaMA-Excitor以解决LLM微调中的知识保留问题 instruction following

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
26 NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields 提出NeRF-MAE以解决自监督3D表示学习问题 representation learning masked autoencoder MAE
27 Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward 提出基于语言模型奖励的直接偏好优化框架以提升视频多模态模型性能 DPO direct preference optimization large language model
28 SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding 提出SpikeMba以解决视频内容理解中的时间视频定位问题 Mamba SSM state space model
29 FlexiDreamer: Single Image-to-3D Generation with FlexiCubes 提出FlexiDreamer以解决多视图图像重建高质量3D网格的问题 dreamer NeRF implicit representation
30 CAMO: Correlation-Aware Mask Optimization with Modulated Reinforcement Learning 提出CAMO以解决光刻工艺中的优化问题 reinforcement learning
31 A Comprehensive Review of Knowledge Distillation in Computer Vision 综述知识蒸馏在计算机视觉中的应用以解决模型复杂性问题 distillation

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
32 A Unified and Interpretable Emotion Representation and Expression Generation 提出统一且可解释的情感表示与表达生成模型C2A2 motion representation
33 SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering 提出SurMo以解决动态人类渲染中的时间运动关系不足问题 human motion

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
34 Large Motion Model for Unified Multi-Modal Motion Generation 提出大型运动模型以统一多模态运动生成任务 text-to-motion motion generation human motion

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
35 DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery 提出DPMesh以解决严重遮挡下的人体网格恢复问题 human mesh recovery spatial relationship

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
36 SUGAR: Pre-training 3D Visual Representations for Robotics 提出SUGAR框架以解决3D视觉表示学习的局限性 manipulation representation learning distillation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
37 Structured Initialization for Attention in Vision Transformers 提出结构化初始化以提升视觉变换器在小规模数据集上的表现 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页