cs.CV(2024-07-11)

📊 共 34 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (10 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗5) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
1 WildGaussians: 3D Gaussian Splatting in the Wild WildGaussians:在复杂场景下实现高质量、实时3D高斯溅射 3D gaussian splatting 3DGS gaussian splatting
2 ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation 提出ScaleDepth,将单目深度估计分解为尺度预测和相对深度估计,提升跨场景泛化性。 depth estimation monocular depth metric depth
3 Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation 提出CLIPtrase,通过重校准自相关性增强CLIP在开放词汇语义分割中的局部特征感知能力。 open-vocabulary open vocabulary
4 Survey on Fundamental Deep Learning 3D Reconstruction Techniques 综述深度学习3D重建技术,聚焦NeRF、LDM和3D高斯溅射。 3D gaussian splatting gaussian splatting splatting
5 Generalizable Implicit Motion Modeling for Video Frame Interpolation 提出通用隐式运动建模GIMM,提升视频帧插值效果 optical flow motion latent spatiotemporal
6 Feasibility of Neural Radiance Fields for Crime Scene Video Reconstruction 探索神经辐射场在犯罪现场视频重建中的可行性 NeRF neural radiance field
7 Explicit-NeRF-QA: A Quality Assessment Database for Explicit NeRF Model Compression 构建Explicit-NeRF-QA数据集,用于评估显式NeRF模型压缩质量 NeRF neural radiance field
8 Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data MIA:利用大规模公共数据赋能鸟瞰图地图构建 semantic map first-person view
9 Event-based vision on FPGAs -- a survey 综述:基于FPGA的事件相机视觉技术,加速低功耗实时嵌入式系统应用 optical flow
10 Enriching Information and Preserving Semantic Consistency in Expanding Curvilinear Object Segmentation Datasets 提出COSTG与SCP ControlNet,用于扩充曲线物体分割数据集并保持语义一致性 semantic map

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
11 Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification 提出数据自适应回溯(DAT)框架,提升视觉-语言基础模型在图像分类任务上的性能 contrastive learning foundation model
12 MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine MAVIS:利用自动数据引擎进行数学视觉指令调优,提升多模态大模型数学能力 DPO direct preference optimization contrastive learning
13 Emergent Visual-Semantic Hierarchies in Image-Text Representations 研究发现CLIP等VLM模型具备涌现的视觉-语义层级理解能力,并提出Radial Embedding框架进行优化。 representation learning large language model foundation model
14 VideoMamba: Spatio-Temporal Selective State Space Model VideoMamba:用于视频识别的时空选择性状态空间模型 Mamba SSM state space model
15 SR-Mamba: Effective Surgical Phase Recognition with State Space Model SR-Mamba:利用状态空间模型实现高效的手术阶段识别 Mamba state space model
16 GraphMamba: An Efficient Graph Structure Learning Vision Mamba for Hyperspectral Image Classification 提出GraphMamba,用于高效学习高光谱图像分类中的图结构和时序特征。 Mamba HSI
17 SliceMamba with Neural Architecture Search for Medical Image Segmentation 提出SliceMamba,结合神经架构搜索,提升医学图像分割性能 Mamba representation learning
18 DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement DegustaBot:面向个性化多物体重排列的零样本视觉偏好估计 preference learning foundation model
19 Exemplar-free Continual Representation Learning via Learnable Drift Compensation 提出可学习漂移补偿以解决无样本持续表征学习问题 representation learning
20 FYI: Flip Your Images for Dataset Distillation 提出FYI:通过图像翻转增强数据集蒸馏,提升小样本语义表达能力 distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
21 SEED-Story: Multimodal Long Story Generation with Large Language Model SEED-Story:利用多模态大语言模型生成长篇多模态故事 large language model multimodal
22 DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception 提出DenseFusion-1M,融合视觉专家知识,提升多模态大语言模型对图像的全面感知能力。 large language model multimodal
23 CAR-MFL: Cross-Modal Augmentation by Retrieval for Multimodal Federated Learning with Missing Modalities 提出CAR-MFL,通过跨模态检索增强解决多模态联邦学习中的模态缺失问题 multimodal
24 15M Multimodal Facial Image-Text Dataset 发布FaceCaption-15M:大规模人脸图像-文本多模态数据集,促进人脸相关任务研究。 multimodal
25 DSCENet: Dynamic Screening and Clinical-Enhanced Multimodal Fusion for MPNs Subtype Classification DSCENet:动态筛选与临床增强的多模态融合用于MPNs亚型分类 multimodal
26 Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding 提出HMLLM模型,利用脑电和眼动多模态数据评估视频理解中的异质性反应。 large language model
27 Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models Live2Diff:提出基于单向注意力机制的视频扩散模型,用于实时流视频翻译。 large language model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
28 MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility MetaUrban:用于城市微出行的具身智能模拟平台,提升AI模型泛化性和安全性。 humanoid reinforcement learning imitation learning
29 MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos MeshAvatar:提出一种从多视角视频学习高质量三角形人像Avatar的新方法 manipulation NeRF neural radiance field

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
30 Infinite Motion: Extended Motion Generation via Long Text Instructions 提出Infinite Motion,通过长文本指令扩展运动生成,实现无限时长高质量运动序列合成。 motion synthesis motion generation
31 A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights 全面综述:人类视频生成面临挑战、方法及未来方向 motion generation

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
32 NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning 提出NODE-Adapter,利用神经常微分方程提升视觉-语言推理能力 human-object interaction
33 Nonverbal Interaction Detection 提出基于超图的非语言交互检测模型NVI-DEHR,解决社交场景下非语言行为理解难题。 HOI

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
34 WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds 提出基于相位流形的跨形态运动对齐方法,实现不同骨骼结构角色间的动作迁移 motion matching motion retrieval

⬅️ 返回 cs.CV 首页 · 🏠 返回主页