cs.CV（2024-07-30）

📊 共 17 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (5) 支柱二：RL算法与架构 (RL & Architecture) (3) 支柱三：空间感知与语义 (Perception & Semantics) (3 🔗2) 支柱一：机器人控制 (Robot Control) (2) 支柱五：交互与反应 (Interaction & Reaction) (2) 支柱四：生成式动作 (Generative Motion) (1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	AI Safety in Practice: Enhancing Adversarial Robustness in Multimodal Image Captioning	提出基于对抗训练的多模态图像描述鲁棒性增强方法	multimodal
2	MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions	提出MMTrail：一个包含语言和音乐描述的大规模多模态预告片视频数据集	multimodal
3	Benchmarking Histopathology Foundation Models for Ovarian Cancer Bevacizumab Treatment Response Prediction from Whole Slide Images	利用病理学基础模型，从WSI预测卵巢癌贝伐珠单抗治疗反应	foundation model
4	SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models	SynthVLM：面向视觉-语言模型的高质量高效图像-文本数据集合成	large language model multimodal
5	Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos	提出ClipSitu，利用CLIP有效生成图像和视频的情境摘要，实现卓越的情境识别与定位。	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
6	CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning	CLEFT：利用高效大语言模型和提示微调的语言-图像对比学习，提升医学影像任务性能。	representation learning contrastive learning large language model
7	Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks	提出PDCL-Attack，利用CLIP模型提升生成模型对抗攻击的迁移性	contrastive learning foundation model
8	SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting	提出SpotFormer，一种多尺度时空Transformer，用于面部表情定位	contrastive learning optical flow

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering	DynaVol-S：通过物体中心体素化和神经渲染实现动态场景理解	NeRF scene understanding
10	NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding	NIS-SLAM：神经隐式语义RGB-D SLAM，实现3D一致的场景理解	implicit representation scene understanding	✅
11	SceneTeller: Language-to-3D Scene Generation	SceneTeller：提出一种基于文本描述生成高质量3D室内场景的开创性方法	3D gaussian splatting gaussian splatting splatting	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
12	FACL-Attack: Frequency-Aware Contrastive Learning for Transferable Adversarial Attacks	提出FACL-Attack，通过频域对比学习增强对抗样本的跨域和跨模型迁移性	domain randomization contrastive learning
13	WARM-3D: A Weakly-Supervised Sim2Real Domain Adaptation Framework for Roadside Monocular 3D Object Detection	提出WARM-3D框架，用于解决路侧单目3D目标检测中的Sim2Real域适应问题。	sim2real

🔬 支柱五：交互与反应 (Interaction & Reaction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Monocular Human-Object Reconstruction in the Wild	提出一种2D监督方法，用于野外场景下单目人体-物体交互3D重建	human-object interaction
15	StackFLOW: Monocular Human-Object Reconstruction by Stacked Normalizing Flow with Offset	StackFLOW：利用堆叠归一化流与偏移量进行单目人体-物体三维重建	human-object interaction

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls	MotionCraft：提出一种即插即用的多模态控制全身运动生成框架。	text-to-motion motion generation SMPL

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos	EgoSonics：提出一种为无声第一视角视频生成同步音频的方法	egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页