cs.CV(2024-08-20)

📊 共 30 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (11 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)

#题目一句话要点标签🔗
1 GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting GS-CPR:利用3D高斯溅射实现高效相机姿态优化 3D gaussian splatting 3DGS gaussian splatting
2 OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding 提出OpenScan基准,用于广义开放词汇3D场景理解 scene understanding open-vocabulary open vocabulary
3 Near, far: Patch-ordering enhances vision foundation models' scene understanding 提出NeCo损失函数,通过patch排序增强视觉基础模型场景理解能力 scene understanding foundation model
4 SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition 利用Conv-Attention增强Emotion-LLaMA,提升多模态情感识别性能 open-vocabulary open vocabulary multimodal
5 On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes 评估开放词汇模型在异常街景目标检测中的潜力,揭示其在开放世界场景下的局限性。 open-vocabulary open vocabulary
6 Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection 提出UniProj-Det,一种轻量级模块化参数高效的开放词汇目标检测框架 open-vocabulary open vocabulary
7 TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks TrackNeRF:通过特征轨迹进行NeRF的Bundle Adjustment,解决稀疏和噪声视角下的重建问题 NeRF neural radiance field
8 DEGAS: Detailed Expressions on Full-Body Gaussian Avatars 提出DEGAS以解决全身高斯头像中细致表情建模问题 3D gaussian splatting 3DGS gaussian splatting
9 Open 3D World in Autonomous Driving 提出一种融合3D点云与文本信息的开放词汇自动驾驶感知方法 open-vocabulary open vocabulary multimodal
10 Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant 提出PoVo,首个无需词汇表的3D实例分割方法,利用视觉-语言助手实现开放场景理解。 open-vocabulary open vocabulary
11 PooDLe: Pooled and dense self-supervised learning from naturalistic videos PooDLe:结合池化与密集自监督学习,从自然视频中学习表征 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
12 Large Language Models for Multimodal Deformable Image Registration 提出LLM-Morph框架,利用大语言模型解决多模态可变形图像配准难题。 large language model multimodal
13 FLAME: Learning to Navigate with Multimodal LLM in Urban Environments FLAME:一种基于多模态LLM的城市环境导航学习方法 VLN large language model multimodal
14 ISLES'24 -- A Real-World Longitudinal Multimodal Stroke Dataset ISLES'24发布真实世界纵向多模态卒中数据集,助力机器学习算法开发。 multimodal
15 ISLES'24: Final Infarct Prediction with Multimodal Imaging and Clinical Data. Where Do We Stand? ISLES'24挑战赛:基于多模态影像和临床数据预测脑梗死,揭示当前技术瓶颈。 multimodal
16 ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model 提出ViLReF,一种专家知识驱动的视网膜视觉-语言预训练模型 foundation model
17 HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models HiRED:一种用于高效推理高分辨率视觉-语言模型的注意力引导Token丢弃方法 large language model multimodal
18 Tapping in a Remote Vehicle's onboard LLM to Complement the Ego Vehicle's Field-of-View 利用远程车辆车载LLM增强自车视野,提升交通安全 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
19 SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining 提出SenPa-MAE,用于多卫星遥感影像自监督预训练,解决跨传感器数据融合问题。 masked autoencoder MAE foundation model
20 ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining ShapeSplat:大规模高斯溅射数据集及其自监督预训练 representation learning MAE 3D gaussian splatting
21 MambaEVT: Event Stream based Visual Object Tracking using State Space Model 提出基于Mamba状态空间模型的事件流视觉目标跟踪框架MambaEVT Mamba state space model
22 MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval 提出MUSE:一种基于Mamba的高效多尺度文本视频检索模型 Mamba
23 Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers 提出基于可解释Vision Transformer的自适应知识蒸馏方法,用于手部图像分类。 distillation
24 Event Stream-based Sign Language Translation: A High-Definition Benchmark Dataset and A Novel Baseline 提出Event-CSL事件流手语翻译数据集和EvSLT基线模型,解决光照和隐私问题。 Mamba spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
25 Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics PartGS:提出一种自监督混合表示学习框架,用于三维场景的部件级解析与重建。 manipulation NeRF
26 A Gray-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse 提出后验坍塌攻击PCA,保护图像免受基于LDM的未经授权编辑。 manipulation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
27 A Review of Human-Object Interaction Detection 综述图像中人-物交互检测方法,分析挑战与未来趋势。 human-object interaction HOI

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
28 Multi-view Hand Reconstruction with a Point-Embedded Transformer 提出POEM模型,利用点嵌入Transformer实现通用多视角手部网格重建 HMR hand reconstruction

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
29 CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network 提出CrossFi,一种基于孪生网络的跨域Wi-Fi感知框架,解决领域迁移问题。 penetration

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
30 A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning 提出基于热立体视觉和深度学习的非接触式波浪测量技术 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页