cs.CV(2024-08-13)

📊 共 21 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis 提出SpectralGaussians,用于多光谱场景的语义化、光谱3D高斯splatting表示、可视化与分析。 3D gaussian splatting 3DGS gaussian splatting
2 HDRGS: High Dynamic Range Gaussian Splatting 提出HDR-GS方法,利用高动态范围高斯溅射技术重建高质量HDR场景。 gaussian splatting splatting NeRF
3 NeRF-US: Removing Ultrasound Imaging Artifacts from Neural Radiance Fields in the Wild NeRF-US:提出一种去除野生超声成像神经辐射场伪影的方法 NeRF neural radiance field
4 SceneGPT: A Language Model for 3D Scene Understanding SceneGPT:一种用于3D场景理解的语言模型,无需3D预训练。 scene understanding affordance spatial relationship
5 SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields 提出SlotLifter,通过槽引导特征提升学习面向对象的辐射场,实现场景重建与分解。 scene reconstruction
6 ActiveNeRF: Learning Accurate 3D Geometry by Active Pattern Projection ActiveNeRF:通过主动图案投影学习精确3D几何 NeRF

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
7 CROME: Cross-Modal Adapters for Efficient Multimodal LLM CROME:用于高效多模态LLM的跨模态适配器 large language model multimodal instruction following
8 PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology PathInsight:指令微调多模态模型,助力病理学智能辅助诊断 multimodal
9 Multimodal Analysis of White Blood Cell Differentiation in Acute Myeloid Leukemia Patients using a β-Variational Autoencoder 提出基于β-VAE的多模态分析方法,用于理解急性髓系白血病患者的白细胞分化。 multimodal
10 Sumotosima: A Framework and Dataset for Classifying and Summarizing Otoscopic Images Sumotosima:用于耳镜图像分类与摘要的深度学习框架与数据集 multimodal
11 DC3DO: Diffusion Classifier for 3D Objects DC3DO:利用扩散模型进行零样本3D物体分类,无需额外训练。 multimodal
12 Specialized Change Detection using Segment Anything 提出基于SAM的专精变化检测方法,解决特定目标消失检测问题。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
13 Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection 结合显著性排序与强化学习,提升轻量级目标检测性能 reinforcement learning deep reinforcement learning
14 Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator 提出Inter-class Feature Compensator (INFER),高效解决数据集蒸馏中的类间特征隔离问题。 distillation
15 Oracle Bone Script Similiar Character Screening Approach Based on Simsiam Contrastive Learning and Supervised Learning 提出基于SimSiam对比学习和监督学习的甲骨文相似字筛选方法 contrastive learning

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
16 Efficient Human-Object-Interaction (EHOI) Detection via Interaction Label Coding and Conditional Decision 提出一种高效的人-物交互检测器EHOI,兼顾性能、效率和可解释性。 human-object interaction HOI
17 MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers MV-DETR:基于多视角DETR Transformer的多模态室内物体检测 ReMoS

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
18 ViMo: Generating Motions from Casual Videos 提出ViMo以解决视频生成3D人类动作的挑战 motion generation video-to-motion

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
19 Controlling the World by Sleight of Hand CosHand:提出动作条件生成模型,用于预测手部与物体交互后的图像变化 manipulation world model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
20 Visual Neural Decoding via Improved Visual-EEG Semantic Consistency 提出Visual-EEG语义解耦框架,提升脑电信号视觉神经解码的语义一致性 geometric consistency

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
21 Dynamic and Compressive Adaptation of Transformers From Images to Videos 提出InTI,通过动态帧间Token插值实现Transformer从图像到视频的压缩自适应。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页