cs.CV(2024-10-11)

📊 共 24 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models SPORTU:一个用于评估多模态大语言模型在体育理解能力上的综合基准 large language model multimodal chain-of-thought
2 Foundation Model-Powered 3D Few-Shot Class Incremental Learning via Training-free Adaptor 提出一种基于预训练3D模型的免训练适配器,解决3D点云少样本增量学习问题 foundation model
3 MiRAGeNews: Multimodal Realistic AI-Generated News Detection 提出MiRAGeNews数据集和MiRAGe检测器,用于检测AI生成的多模态新闻内容 multimodal
4 Movie Trailer Genre Classification Using Multimodal Pretrained Features 提出一种基于多模态预训练特征的电影预告片类型分类新方法 multimodal
5 A foundation model for generalizable disease diagnosis in chest X-ray images CXRBase:用于胸部X光图像疾病诊断的通用基础模型 foundation model
6 Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping 提出VLB动态多模态评估框架,解决LVLM评估的数据污染和复杂度固定问题 multimodal
7 VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanding 提出VERIFIED,一个用于细粒度视频理解的视频片段检索基准。 large language model foundation model multimodal
8 Can GPTs Evaluate Graphic Design Based on Design Principles? 研究GPT在平面设计评估中的能力,对比设计原则启发式评估与人类标注。 foundation model multimodal
9 Hespi: A pipeline for automatically detecting information from hebarium specimen sheets Hespi:一种自动检测植物标本信息的数据提取流水线 large language model multimodal
10 Chain-of-Restoration: Multi-Task Image Restoration Models are Zero-Shot Step-by-Step Universal Image Restorers 提出Chain-of-Restoration,实现多任务图像复原模型零样本逐步通用图像复原 large language model chain-of-thought
11 Zero-Shot Pupil Segmentation with SAM 2: A Case Study of Over 14 Million Images 利用SAM 2实现零样本瞳孔分割,在超1400万图像上达到媲美专用模型的性能 foundation model
12 Multi-modal Fusion based Q-distribution Prediction for Controlled Nuclear Fusion 提出基于多模态融合的Q分布预测方法,提升受控核聚变预测精度。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
13 SurgicalGS: Dynamic 3D Gaussian Splatting for Accurate Robotic-Assisted Surgical Scene Reconstruction SurgicalGS:用于精准机器人辅助手术场景重建的动态3D高斯溅射 3D gaussian splatting gaussian splatting splatting
14 Learning Interaction-aware 3D Gaussian Splatting for One-shot Hand Avatars 提出交互感知3D高斯溅射框架,用于单张图像手部Avatar的生成与动画 3D gaussian splatting gaussian splatting splatting
15 MeshGS: Adaptive Mesh-Aligned Gaussian Splatting for High-Quality Rendering MeshGS:提出自适应网格对齐高斯溅射,实现高质量渲染。 3D gaussian splatting gaussian splatting splatting
16 Look Gauss, No Pose: Novel View Synthesis using Gaussian Splatting without Accurate Pose Initialization 提出NoPoseGS,无需精确位姿初始化即可实现高精度新视角合成。 3D gaussian splatting gaussian splatting splatting
17 Boosting Open-Vocabulary Object Detection by Handling Background Samples 提出BIRDet,通过处理背景样本提升开放词汇目标检测性能 open-vocabulary open vocabulary
18 VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking VOVTrack:探索视频潜力,解决开放词汇目标跟踪难题 open-vocabulary open vocabulary
19 Ego3DT: Tracking Every 3D Object in Ego-centric Videos Ego3DT:提出一种零样本方法,用于自我中心视频中所有3D物体的跟踪。 scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
20 SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction 提出SmartPretrain以解决运动预测中的数据稀缺问题 representation learning MAE spatiotemporal
21 Semantic Score Distillation Sampling for Compositional Text-to-3D Generation 提出SemanticSDS,提升文本到3D生成中复杂场景的表达能力和准确性 distillation semantic map
22 Enabling Advanced Land Cover Analytics: An Integrated Data Extraction Pipeline for Predictive Modeling with the Dynamic World Dataset 提出集成数据提取流程,赋能基于Dynamic World数据集的土地覆盖高级分析与预测建模。 predictive model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
23 HorGait: A Hybrid Model for Accurate Gait Recognition in LiDAR Point Cloud Planar Projections HorGait:一种混合模型,用于LiDAR点云平面投影中准确的步态识别 ReMoS

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
24 Facial Chick Sexing: An Automated Chick Sexing System From Chick Facial Image 提出基于面部图像的雏鸡性别自动鉴定系统,提高效率和动物福利。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页