cs.CV(2024-07-30)

📊 共 17 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (5) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
1 AI Safety in Practice: Enhancing Adversarial Robustness in Multimodal Image Captioning 提出基于对抗训练的多模态图像描述鲁棒性增强方法 multimodal
2 MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions 提出MMTrail:一个包含语言和音乐描述的大规模多模态预告片视频数据集 multimodal
3 Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images 利用病理学基础模型,从WSI预测卵巢癌贝伐珠单抗治疗反应 foundation model
4 SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models SynthVLM:面向视觉-语言模型的高质量高效图像-文本数据集合成 large language model multimodal
5 Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos 提出ClipSitu,利用CLIP有效生成图像和视频的情境摘要,实现卓越的情境识别与定位。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
6 CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning CLEFT:利用高效大语言模型和提示微调的语言-图像对比学习,提升医学影像任务性能。 representation learning contrastive learning large language model
7 Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks 提出PDCL-Attack,利用CLIP模型提升生成模型对抗攻击的迁移性 contrastive learning foundation model
8 SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting 提出SpotFormer,一种多尺度时空Transformer,用于面部表情定位 contrastive learning optical flow

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
9 Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering DynaVol-S:通过物体中心体素化和神经渲染实现动态场景理解 NeRF scene understanding
10 NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding NIS-SLAM:神经隐式语义RGB-D SLAM,实现3D一致的场景理解 implicit representation scene understanding
11 SceneTeller: Language-to-3D Scene Generation SceneTeller:提出一种基于文本描述生成高质量3D室内场景的开创性方法 3D gaussian splatting gaussian splatting splatting

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
12 FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks 提出FACL-Attack,通过频域对比学习增强对抗样本的跨域和跨模型迁移性 domain randomization contrastive learning
13 WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection 提出WARM-3D框架,用于解决路侧单目3D目标检测中的Sim2Real域适应问题。 sim2real

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
14 Monocular Human-Object Reconstruction in the Wild 提出一种2D监督方法,用于野外场景下单目人体-物体交互3D重建 human-object interaction
15 StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset StackFLOW:利用堆叠归一化流与偏移量进行单目人体-物体三维重建 human-object interaction

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
16 MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls MotionCraft:提出一种即插即用的多模态控制全身运动生成框架。 text-to-motion motion generation SMPL

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
17 EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos EgoSonics:提出一种为无声第一视角视频生成同步音频的方法 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页