cs.CV(2024-08-29)

📊 共 25 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱一:机器人控制 (Robot Control) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 OmniRe: Omni Urban Scene Reconstruction OmniRe:构建高保真动态城市场景数字孪生,支持全动态前景重建 3DGS gaussian splatting splatting
2 ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model ReconX:利用视频扩散模型从稀疏视图重建任意场景 3D gaussian splatting gaussian splatting splatting
3 NeRF-CA: Dynamic Reconstruction of X-ray Coronary Angiography with Extremely Sparse-views NeRF-CA:提出一种基于极稀疏视角X射线冠状动脉造影的动态重建方法 NeRF neural radiance field
4 EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More EvLight++:提出一种事件相机引导的低光视频增强方法,并构建大规模真实数据集。 depth estimation monocular depth foundation model
5 Creating a Segmented Pointcloud of Grapevines by Combining Multiple Viewpoints Through Visual Odometry 结合视觉里程计多视角融合,构建葡萄藤分割点云用于冬季修剪 visual odometry
6 Generic Objects as Pose Probes for Few-shot View Synthesis 提出PoseProbe,利用常见物体作为位姿探针,解决少视角NeRF重建问题。 NeRF scene reconstruction
7 Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding 提出基于证据深度学习的SRAM模块,提升视频时序定位在开放环境下的鲁棒性。 open-vocabulary open vocabulary
8 Spurfies: Sparse Surface Reconstruction using Local Geometry Priors Spurfies:利用局部几何先验的稀疏表面重建方法 NeRF

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
9 Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach 提出基于多模态融合的MSTNet模型,用于阿尔茨海默病早期诊断。 multimodal
10 Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning 提出SlotSAM,通过对象中心学习提升分割基础模型在分布偏移下的泛化能力 foundation model
11 Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models 利用多模态大语言模型,重构稀疏词汇表示用于图像检索 large language model
12 GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models 提出GradBias框架,揭示文本到图像生成模型中词语对偏见的影响。 large language model foundation model
13 Law of Vision Representation in MLLMs 提出多模态大语言模型(MLLM)的视觉表征定律,通过AC score优化视觉表征。 large language model multimodal
14 CogVLM2: Visual Language Models for Image and Video Understanding CogVLM2:用于图像和视频理解的视觉语言模型,支持高分辨率和时序建模。 TAMP
15 Exploiting temporal information to detect conversational groups in videos and predict the next speaker 利用时序信息检测视频会话群体并预测下一位发言者 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
16 COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation COIN:用于人和相机运动估计的可控Inpainting扩散先验 distillation motion diffusion model motion diffusion
17 VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition 提出VLM-KD,利用视觉语言模型蒸馏知识,提升长尾视觉识别性能 distillation
18 UDD: Dataset Distillation via Mining Underutilized Regions UDD:通过挖掘欠利用区域实现数据集蒸馏,提升合成数据利用率。 distillation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
19 FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection 提出FastForensics,一种高效双流架构用于实时图像篡改检测 manipulation
20 Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis 提出基于联合与个体成分分析的扩散模型局部编辑方法 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
21 VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation VideoLLM-MoD:混合深度视觉计算的高效视频语言流处理 Ego4D
22 3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation Approach 提出基于3D姿态的花样滑冰动作分割与细粒度标注方法,解决缺乏3D姿态数据集和跳跃过程学习的问题。 markerless motion capture

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
23 ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding ResVG:增强关系和语义理解,解决视觉定位中多实例干扰问题 spatial relationship visual grounding

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
24 FineFACE: Fair Facial Attribute Classification Leveraging Fine-grained Features FineFACE:利用细粒度特征实现公平的人脸属性分类 mutual attention

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
25 Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation 提出一种免训练的姿态引导视频生成增强策略,解决动画生成中外观一致性问题 character animation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页