cs.CV(2024-09-27)

📊 共 26 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (6) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗2) 支柱一:机器人控制 (Robot Control) (3) 支柱八:物理动画 (Physics-based Animation) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 Enhancing Explainability in Multimodal Large Language Models Using Ontological Context 提出基于本体知识的多模态大语言模型增强框架,提升植物病害图像分类的可解释性。 large language model multimodal
2 FoodMLLM-JP: Leveraging Multimodal Large Language Models for Japanese Recipe Generation FoodMLLM-JP:利用多模态大语言模型生成日式食谱 large language model multimodal
3 Multimodal Pragmatic Jailbreak on Text-to-image Models 提出多模态语用越狱方法,揭示并评估文本生成图像模型中的安全漏洞。 multimodal
4 Mixture of Multicenter Experts in Multimodal AI for Debiased Radiotherapy Target Delineation 提出多中心专家混合模型(MoME),解决放疗靶区勾画中的AI偏见问题。 multimodal
5 Multimodal Markup Document Models for Graphic Design Completion 提出MarkupDM多模态文档模型,用于图形设计补全任务,实现设计自动化。 multimodal
6 When SAM2 Meets Video Camouflaged Object Segmentation: A Comprehensive Evaluation and Adaptation 评估与改进SAM2在视频伪装目标分割中的性能,提升其在复杂场景下的检测能力 large language model foundation model multimodal
7 Image-guided topic modeling for interpretable privacy classification 提出图像引导的主题建模方法,用于可解释的图像隐私分类。 large language model multimodal
8 MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion MASt3R-SfM:一种用于无约束Structure-from-Motion的全集成解决方案 foundation model
9 Show and Guide: Instructional-Plan Grounded Vision and Language Model 提出MM-PlanLLM,用于视觉指导下的指令计划执行,解决现有模型缺乏多模态能力的问题。 multimodal
10 Improving Visual Object Tracking through Visual Prompting 提出基于视觉Prompting的PiVOT跟踪器,提升视觉目标跟踪的抗干扰能力。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
11 Exploring Token Pruning in Vision State Space Models 针对视觉状态空间模型,提出一种新型token剪枝方法以提升效率。 Mamba SSM state space model
12 MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation MiniVLN:通过渐进式知识蒸馏实现高效的视觉-语言导航 distillation embodied AI VLN
13 How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks? 研究大型掩码自编码器预训练在地球观测下游任务中的有效性 masked autoencoder MAE foundation model
14 Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation 提出面向学生的知识提炼方法,提升知识蒸馏效果 distillation
15 Harmonizing knowledge Transfer in Neural Network with Unified Distillation 提出统一蒸馏框架,融合神经网络多层知识迁移,提升模型性能。 distillation
16 You Only Speak Once to See 提出YOSS模型,利用音频引导实现图像中的物体定位 contrastive learning scene understanding

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
17 Space-time 2D Gaussian Splatting for Accurate Surface Reconstruction under Complex Dynamic Scenes 提出时空2D高斯溅射,用于复杂动态场景下精确表面重建 gaussian splatting splatting human-object interaction
18 Search3D: Hierarchical Open-Vocabulary 3D Segmentation 提出Search3D,实现层级开放词汇3D分割与搜索 open-vocabulary open vocabulary
19 Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation 提出基于高斯溅射的文化遗产三维数字化与自动分割方法 gaussian splatting splatting

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
20 SinoSynth: A Physics-based Domain Randomization Approach for Generalizable CBCT Image Enhancement SinoSynth:基于物理的域随机化方法,用于可泛化的CBCT图像增强 domain randomization
21 S2O: Static to Openable Enhancement for Articulated 3D Objects 提出S2O框架,从静态3D物体生成可交互的可开合3D物体,用于机器人操作和具身智能。 manipulation embodied AI
22 Spectral Wavelet Dropout: Regularization in the Wavelet Domain 提出Spectral Wavelet Dropout (SWD),通过小波域正则化提升CNN泛化能力。 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
23 From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding 综述性论文:多模态大语言模型在长视频理解中的应用与挑战 spatiotemporal large language model multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
24 UniCal: Unified Neural Sensor Calibration UniCal:提出一种统一的神经传感器标定框架,解决自动驾驶车辆多传感器标定难题。 geometric consistency

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
25 PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation PhysGen:基于刚体物理的图像到视频生成方法,实现逼真可控的视频生成。 physically plausible

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
26 Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images 提出MHCDIFF,用于从被遮挡图像中重建具有细节的3D人体 SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页