cs.CV（2024-06-07）

📊 共 26 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗5) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (5) 支柱一：机器人控制 (Robot Control) (3) 支柱七：动作重定向 (Motion Retargeting) (2 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Towards Semantic Equivalence of Tokenization in Multimodal LLM	提出动态语义等价视觉Token化方法SeTok，提升多模态大语言模型性能	large language model multimodal	✅
2	MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description	提出MGIMM，通过多粒度指令学习实现遥感图像属性引导的详细描述生成。	large language model multimodal	✅
3	VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging	VISTA3D：用于3D医学影像的统一分割基础模型	foundation model	✅
4	RU-AI: A Large Multimodal Dataset for Machine-Generated Content Detection	提出RU-AI：一个大规模多模态数据集，用于检测机器生成内容	multimodal	✅
5	Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization	提出LOGRAN，利用软逻辑正则化实现可解释的多模态语境外信息检测。	multimodal
6	LocLLM: Exploiting Generalizable Human Keypoint Localization via Large Language Model	提出LocLLM，利用大语言模型实现更通用的基于文本描述的人体关键点定位	large language model
7	Predictive Dynamic Fusion	提出预测动态融合框架，解决多模态融合中的不稳定性问题。	multimodal	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
8	USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation	提出通用分割嵌入USE框架，解决开放词汇图像分割中的精确分类问题	open-vocabulary open vocabulary foundation model
9	OVMR: Open-Vocabulary Recognition with Multi-Modal References	提出OVMR，利用多模态参考信息实现开放词汇识别	open-vocabulary open vocabulary	✅
10	Composition Vision-Language Understanding via Segment and Depth Anything Model	提出深度与分割模型融合以增强视觉语言理解	Depth Anything multimodal	✅
11	Multi-style Neural Radiance Field with AdaIN	提出结合AdaIN和NeRF的多风格神经辐射场，用于风格化新视角合成	NeRF neural radiance field
12	Normal-guided Detail-Preserving Neural Implicit Function for High-Fidelity 3D Surface Reconstruction	提出法线引导的神经隐函数，用于高保真三维表面重建，尤其适用于稀疏视图场景。	monocular depth implicit representation
13	Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior	提出自适应运动先验以解决视频编辑一致性问题	optical flow	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
14	STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting	STAR：提出骨骼感知的文本驱动4D Avatar生成方法，实现网络内运动重定向。	distillation motion retargeting
15	Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs	提出Diffusion Mamba (DiM-3D)模型，高效生成高分辨率3D形状，解决传统扩散模型计算瓶颈。	Mamba SSM
16	Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement	提出时空建模与对比学习相结合的自监督心率测量方法，在RePSS Challenge中获得第二名。	contrastive learning
17	MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers	MA-AVT：提出一种参数高效的音视频Transformer，通过模态对齐提升性能。	contrastive learning multimodal
18	Attention Fusion Reverse Distillation for Multi-Lighting Image Anomaly Detection	提出注意力融合反向蒸馏(AFRD)方法，解决多光照图像异常检测问题。	distillation

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Training-Free Video Editing via Optical Flow-Enhanced Score Distillation	提出基于光流增强Score Distillation的免训练视频编辑方法	manipulation distillation optical flow
20	3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination	提出3D-GRAND数据集，提升3D-LLM的场景理解能力并减少幻觉	sim-to-real embodied AI large language model
21	Varying Manifolds in Diffusion: From Time-varying Geometries to Visual Saliency	提出基于生成率的扩散模型几何分析方法，实现图像显著性操控及多种图像编辑任务。	manipulation

🔬 支柱七：动作重定向 (Motion Retargeting) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
22	Diving Deep into the Motion Representation of Video-Text Models	利用GPT-4生成细粒度运动描述，提升视频-文本模型对视频运动的理解能力	motion representation
23	SMC++: Masked Learning of Unsupervised Video Semantic Compression	提出基于掩码学习的无监督视频语义压缩框架SMC++，提升视频分析任务性能	motion prediction	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	SMART: Scene-motion-aware human action recognition framework for mental disorder group	针对精神障碍患者，提出场景-运动感知的行为识别框架SMART，用于智能医疗视频监控。	human-scene interaction human motion	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
25	ProMotion: Prototypes As Motion Learners	ProMotion：提出基于原型学习的统一运动建模框架，提升多种运动任务性能	feature matching motion representation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
26	Semantic Segmentation on VSPW Dataset through Masked Video Consistency	提出基于掩码视频一致性的语义分割方法，提升VSPW数据集性能。	spatiotemporal multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页