cs.CV(2024-10-21)

📊 共 38 篇论文 | 🔗 12 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (15 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (14 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (15 篇)

#题目一句话要点标签🔗
1 Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models 提出Griffon-G,统一视觉语言和视觉中心任务的大型多模态模型 large language model multimodal instruction following
2 PlaneSAM: Multimodal Plane Instance Segmentation Using the Segment Anything Model PlaneSAM:利用Segment Anything Model实现多模态平面实例分割 multimodal zero-shot transfer
3 Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance Mini-InternVL:以5%参数量实现90%性能的灵活迁移多模态模型 large language model multimodal
4 Domain-Adaptive Pre-training of Self-Supervised Foundation Models for Medical Image Classification in Gastrointestinal Endoscopy 提出领域自适应预训练方法,提升胃肠内窥镜医学图像分类性能 foundation model
5 Benchmarking Pathology Foundation Models: Adaptation Strategies and Scenarios 病理学Foundation Model基准测试:针对不同适应策略与应用场景的评估 foundation model
6 Foundation Models for Slide-level Cancer Subtyping in Digital Pathology 利用领域预训练的Foundation Model提升数字病理切片级癌症亚型分类性能 foundation model
7 Joint Top-Down and Bottom-Up Frameworks for 3D Visual Grounding 提出联合自顶向下与自底向上框架,用于提升3D视觉定位性能 visual grounding
8 Multimodal Learning for Embryo Viability Prediction in Clinical IVF 提出一种多模态学习模型,用于临床IVF中胚胎活力预测。 multimodal
9 Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining 提出自适应图像-文本质量增强器AITQE,用于提升多模态大语言模型预训练数据质量。 large language model multimodal
10 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree 提出SAM2Long,通过无训练的记忆树优化SAM2在长视频分割中的性能。 foundation model
11 Mitigating Object Hallucination via Concentric Causal Attention 提出同心因果注意力(CCA)以缓解大型视觉语言模型中的对象幻觉问题 multimodal
12 Reducing Hallucinations in Vision-Language Models via Latent Space Steering 提出VTI:通过隐空间引导减少视觉-语言模型中的幻觉问题 large language model
13 Improving Instance Optimization in Deformable Image Registration with Gradient Projection 提出梯度投影的形变图像配准实例优化方法,提升配准精度和稳定性 foundation model
14 When LLMs Learn to be Students: The SOEI Framework for Modeling and Evaluating Virtual Student Agents in Educational Interaction 提出SOEI框架,用于构建和评估教育互动中基于LLM的虚拟学生代理 large language model
15 Deep Learning and Machine Learning -- Object Detection and Semantic Segmentation: From Theory to Applications 综述目标检测与语义分割,结合理论与应用,探索深度学习前沿技术。 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (14 篇)

#题目一句话要点标签🔗
16 CL-HOI: Cross-Level Human-Object Interaction Distillation from Vision Large Language Models 提出CL-HOI框架,利用视觉大语言模型蒸馏实现无需标注的人-物交互检测 distillation human-object interaction HOI
17 LLaVA-KD: A Framework of Distilling Multimodal Large Language Models LLaVA-KD:一种用于蒸馏多模态大语言模型的框架 distillation large language model multimodal
18 Few-shot target-driven instance detection based on open-vocabulary object detection models 提出一种轻量级方法,利用开放词汇目标检测模型实现少样本目标驱动的实例检测。 world model open-vocabulary open vocabulary
19 START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation 提出基于显著性驱动的Token感知变换状态空间模型START,提升域泛化能力。 Mamba SSM state space model
20 MBPU: A Plug-and-Play State Space Model for Point Cloud Upsamping with Fast Point Rendering 提出基于Mamba的MBPU网络,用于大规模点云上采样并减少伪影。 Mamba state space model
21 Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identification 提出SSSC-TransReID,增强Transformer在遮挡场景下行人重识别的特征表达能力 representation learning contrastive learning
22 Joker: Conditional 3D Head Synthesis with Extreme Facial Expressions Joker:基于条件扩散模型的三维头部极端表情合成 distillation NeRF neural radiance field
23 YOLO11 and Vision Transformers based 3D Pose Estimation of Immature Green Fruits in Commercial Apple Orchards for Robotic Thinning 提出基于YOLO11与Vision Transformer的苹果幼果三维姿态估计方法,用于机器人疏果 MAE depth estimation Depth Anything
24 LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset 提出LMHaze大规模真实雾霾数据集,并设计MoE-Mamba模型提升图像去雾性能 Mamba multimodal
25 Robust Visual Representation Learning with Multi-modal Prior Knowledge for Image Classification Under Distribution Shift 提出知识引导的视觉表征学习方法KGV,提升图像分类在分布偏移下的泛化能力。 representation learning
26 Learning from Neighbors: Category Extrapolation for Long-Tail Learning 提出基于邻域学习的类别外推方法,解决长尾学习中尾部类别泛化性差的问题。 representation learning large language model
27 Are Large-scale Soft Labels Necessary for Large-scale Dataset Distillation? 提出类内监督的数据集蒸馏方法,显著压缩软标签大小并提升性能。 distillation
28 Contrastive Learning with Auxiliary User Detection for Identifying Activities 提出CLAUDIA框架,通过辅助用户检测的对比学习提升用户和上下文感知的人类活动识别。 contrastive learning
29 TIPS: Text-Image Pretraining with Spatial awareness 提出TIPS以解决图像文本表示学习中的空间意识不足问题 representation learning depth estimation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
30 3DGS-Enhancer: Enhancing Unbounded 3D Gaussian Splatting with View-consistent 2D Diffusion Priors 3DGS-Enhancer:利用视角一致的2D扩散先验增强无界3D高斯溅射 3D gaussian splatting 3DGS gaussian splatting
31 Fully Explicit Dynamic Gaussian Splatting 提出显式4D高斯溅射(Ex4DGS)用于动态场景快速高质量渲染。 3D gaussian splatting gaussian splatting splatting
32 FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors FrugalNeRF:无需先验知识,实现极端少样本新视角合成的快速收敛 NeRF neural radiance field scene reconstruction
33 Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly 提出深度先验组装框架,实现单张图像零样本场景重建 scene reconstruction
34 Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation FocusBEV:单目BEV分割的自校准循环视角变换方法 semantic map spatiotemporal

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
35 MvDrag3D: Drag-based Creative 3D Editing via Multi-view Generation-Reconstruction Priors MvDrag3D:基于多视角生成-重建先验的拖拽式创意3D编辑 latent optimization
36 Revisiting Deep Feature Reconstruction for Logical and Structural Industrial Anomaly Detection 提出ULSAD:融合深度特征重建与注意力机制,用于工业逻辑与结构异常检测 spatial relationship

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
37 ARTS: Semi-Analytical Regressor using Disentangled Skeletal Representations for Human Mesh Recovery from Videos ARTS:利用解耦骨骼表示的半解析回归器,用于视频人体网格重建 human mesh recovery human motion

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
38 DeepIcon: A Hierarchical Network for Layer-wise Icon Vectorization 提出DeepIcon,用于从栅格图像分层矢量化生成可变长度的图标矢量图。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页