cs.CV(2024-08-30)

📊 共 18 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (2) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model DARES:利用自监督Vector-LoRA改进机器人内窥镜手术中的Depth Anything模型 depth estimation monocular depth Depth Anything
2 Open-Vocabulary Action Localization with Iterative Visual Prompting 提出基于迭代视觉提示的开放词汇动作定位方法,无需训练即可实现视频动作定位。 open-vocabulary open vocabulary
3 AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding AdaptVision:MLLM中动态输入缩放,用于多功能场景理解 scene understanding large language model multimodal
4 UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios UrBench:一个综合性的多视角城市场景大模型评测基准 scene understanding multimodal
5 Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms 提出合成月球地形(SLT)多模态开放数据集,用于训练和评估神经形态视觉算法。 depth estimation multimodal
6 OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping OG-Mapping:基于八叉树结构化3D高斯的在线稠密建图方法 3D gaussian splatting 3DGS gaussian splatting
7 2DGH: 2D Gaussian-Hermite Splatting for High-quality Rendering and Better Geometry Reconstruction 提出基于高斯-埃尔米特核的2D高斯溅射,提升渲染质量和几何重建效果 gaussian splatting splatting
8 BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities BOP-Distrib:重新审视6D位姿估计基准,提升视觉歧义下的评估质量 6D pose estimation
9 ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images 提出ConDense框架以解决3D基础模型训练中的特征一致性问题 NeRF foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
10 VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters VisionTS:利用视觉掩码自编码器实现零样本时间序列预测 masked autoencoder large language model foundation model
11 Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training 提出随机分层Shuffle方法,提升Vision Mamba在ImageNet上的训练效果 Mamba
12 Instant Adversarial Purification with Adversarial Consistency Distillation 提出OSCP,通过对抗一致性蒸馏实现单步扩散模型对抗样本净化,显著提升效率。 distillation
13 Contrastive Learning with Synthetic Positives 提出CLSP方法,利用合成图像作为对比学习的补充正样本,提升自监督学习性能。 contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
14 NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar NanoMVG:面向USV的低功耗多任务视觉定位模型,融合提示引导的相机和4D毫米波雷达 visual grounding
15 From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space 利用ImageBind分析多模态嵌入空间,为在线汽配列表生成有意义的融合嵌入。 multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
16 EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs 提出EMHI多模态数据集,用于解决VR/AR中基于头显和IMU的以自我为中心的人体运动估计问题 SMPL egocentric multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
17 TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation TIMotion:提出时序交互框架,高效生成人与人之间的互动动作 motion generation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
18 Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning 提出视频级混合数据和时空适配器,提升深度伪造视频检测的泛化能力。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页