cs.CV(2024-07-23)

📊 共 22 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 HDRSplat: Gaussian Splatting for High Dynamic Range 3D Scene Reconstruction from Raw Images HDRSplat:利用高动态范围原始图像进行3D高斯溅射场景重建 3D gaussian splatting 3DGS gaussian splatting
2 MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues MicroEmo:针对视频对话中微表情动态的时间敏感多模态情感识别模型 open-vocabulary open vocabulary large language model
3 DHGS: Decoupled Hybrid Gaussian Splatting for Driving Scene 提出解耦混合高斯溅射(DHGS),提升驾驶场景新视角合成质量。 gaussian splatting splatting
4 SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation SAM-CP:结合可组合提示的SAM,实现多功能分割 open-vocabulary open vocabulary foundation model
5 ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation 提出ToDER,通过几何约束自适应进行结肠镜深度估计与重建 depth estimation
6 SINDER: Repairing the Singular Defects of DINOv2 SINDER通过平滑正则化修复DINOv2的奇异缺陷,提升下游任务性能。 depth estimation
7 VRP-UDF: Towards Unbiased Learning of Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors 提出VRP-UDF,利用体渲染先验解决多视角图像无符号距离函数学习中的偏差问题。 implicit representation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
8 MLLM-CompBench: A Comparative Reasoning Benchmark for Multimodal LLMs MLLM-CompBench:用于评估多模态大语言模型比较推理能力的基准测试。 large language model multimodal
9 PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects PartGLEE:用于识别和解析任意对象部件的部件级基础模型 foundation model
10 Histopathology image embedding based on foundation models features aggregation for patient treatment response prediction 提出基于Foundation Model特征聚合的病理图像嵌入方法,用于预测弥漫大B细胞淋巴瘤患者的治疗反应。 foundation model
11 C3T: Cross-modal Transfer Through Time for Sensor-based Human Activity Recognition C3T:通过时间跨模态迁移,提升传感器人体活动识别在无监督模态适应下的性能 multimodal
12 Unveiling and Mitigating Bias in Audio Visual Segmentation 针对视听分割中音频启动偏差和视觉先验偏差,提出感知模块和对比学习策略。 visual grounding
13 Category-Extensible Out-of-Distribution Detection via Hierarchical Context Descriptions 提出CATEX,通过分层上下文描述实现可扩展的OOD检测。 large language model
14 Harmonizing Visual Text Comprehension and Generation TextHarmony:提出Slide-LoRA,统一视觉文本理解与生成任务。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
15 Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions 提出基于扩散模型的单目深度估计方法,提升复杂场景下的鲁棒性 distillation depth estimation monocular depth
16 MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence MovieDreamer:提出层级生成框架,实现连贯长视觉序列的电影级视频生成 dreamer multimodal
17 Accelerating Learned Video Compression via Low-Resolution Representation Learning 提出基于低分辨率表示学习的加速视频压缩框架,显著提升编解码速度。 representation learning
18 A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation 提出多视角掩码对比学习图卷积网络用于人脸年龄估计 contrastive learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
19 EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval EgoCVR:一个用于细粒度组合视频检索的自中心视角基准数据集 egocentric
20 Motion Capture from Inertial and Vision Sensors 提出MINIONS数据集和SparseNet框架,实现基于惯性和视觉传感器的低成本人体运动捕捉。 SMPL

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization 提出粗到精的音频时间伪造检测与定位框架,解决现有方法无法定位篡改片段的问题。 manipulation representation learning TAMP

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
22 VisMin: Visual Minimal-Change Understanding 提出VisMin基准,用于评估视觉语言模型在细粒度视觉理解上的能力 spatial relationship large language model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页