cs.CV(2024-08-27)

📊 共 27 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (10 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3) 支柱五:交互与反应 (Interaction & Reaction) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
1 LapisGS: Layered Progressive 3D Gaussian Splatting for Adaptive Streaming 提出LapisGS以解决XR环境下3D流媒体传输问题 3D gaussian splatting 3DGS gaussian splatting
2 Drone-assisted Road Gaussian Splatting with Cross-view Uncertainty 提出基于跨视角不确定性的无人机辅助道路场景高斯溅射方法,提升道路视角渲染质量。 3D gaussian splatting gaussian splatting splatting
3 RSTeller: Scaling Up Visual Language Modeling in Remote Sensing with Rich Linguistic Semantics from Openly Available Data and Large Language Models RSTeller:利用开放数据和大型语言模型,扩展遥感视觉语言建模。 scene understanding large language model multimodal
4 Adversarial Manhole: Challenging Monocular Depth Estimation and Semantic Segmentation Models with Patch Attack 提出基于井盖贴片的对抗攻击,用于欺骗单目深度估计和语义分割模型 depth estimation monocular depth
5 MROVSeg: Breaking the Resolution Curse of Vision-Language Models in Open-Vocabulary Image Segmentation MROVSeg:突破开放词汇图像分割中视觉-语言模型的分辨率瓶颈 open-vocabulary open vocabulary
6 Learning-based Multi-View Stereo: A Survey 综述学习型多视图立体视觉方法,重点分析深度图方法并展望未来方向。 3D gaussian splatting gaussian splatting splatting
7 MMASD+: A Novel Dataset for Privacy-Preserving Behavior Analysis of Children with Autism Spectrum Disorder MMASD+:用于自闭症儿童行为分析的隐私保护多模态数据集与Transformer框架 optical flow multimodal
8 GeoTransfer : Generalizable Few-Shot Multi-View Reconstruction via Transfer Learning GeoTransfer:通过迁移学习实现可泛化的少样本多视角重建 NeRF neural radiance field
9 Interactive Occlusion Boundary Estimation through Exploitation of Synthetic Data 提出MS³PE框架,通过交互式涂鸦和合成数据,提升遮挡边界估计精度。 scene understanding
10 BOX3D: Lightweight Camera-LiDAR Fusion for 3D Object Detection and Localization BOX3D:轻量级相机-激光雷达融合方案,用于3D目标检测与定位 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
11 Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification 提出TFA-LT,通过文本引导的微调方法解决医学图像长尾分类问题 representation learning foundation model multimodal
12 Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis 提出一种多模态指令调优框架,用于半导体电镜图像分析 teacher-student large language model multimodal
13 MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders MTMamba++:基于Mamba解码器的多任务密集场景理解框架 Mamba scene understanding
14 ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning 提出ZeroMamba,利用视觉状态空间模型提升零样本学习性能 Mamba state space model representation learning
15 Mamba2MIL: State Space Duality Based Multiple Instance Learning for Computational Pathology Mamba2MIL:基于状态空间对偶性的多示例学习用于计算病理学 Mamba SSM
16 Few-Shot Unsupervised Implicit Neural Shape Representation Learning with Spatial Adversaries 提出基于空间对抗的少样本无监督隐式神经形状表示学习方法 representation learning
17 MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation MeshUp:提出一种基于混合分数蒸馏的多目标网格形变方法 distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
18 From Bias to Balance: Detecting Facial Expression Recognition Biases in Large Multimodal Foundation Models 揭示大型多模态模型中面部表情识别的种族偏见 foundation model multimodal
19 DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding 提出DocLayLLM,一种高效的多模态大语言模型扩展,用于文本丰富的文档理解。 large language model multimodal chain-of-thought
20 Leveraging Hallucinations to Reduce Manual Prompt Dependency in Promptable Segmentation 利用大语言模型幻觉提升可控分割精度,降低人工提示依赖 large language model multimodal chain-of-thought
21 Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild 提出基于CLIP的多模态植物病害检索系统,助力田间植物病害快速诊断。 multimodal
22 HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling HPT++:通过多粒度知识生成和改进的结构建模,分层提示视觉-语言模型 large language model foundation model
23 NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework 提出NeuralOOD,利用脑机融合学习框架提升模型在分布外数据上的泛化性能 multimodal

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
24 Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection 提出HGINet,通过分层图交互Transformer和动态Token聚类实现伪装目标检测。 interaction transformer
25 DCT-CryptoNets: Scaling Private Inference in the Frequency Domain DCT-CryptoNets:提出频域上的私有推理方法,加速同态加密神经网络。 OMOMO

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
26 A Review of Transformer-Based Models for Computer Vision Tasks: Capturing Global Context and Spatial Relationships 综述Transformer在计算机视觉中的应用,探索全局上下文建模与空间关系捕获 spatial relationship

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
27 DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose DiffSurf:提出基于Transformer的扩散模型,用于生成和重建具有姿态的3D表面。 human mesh recovery

⬅️ 返回 cs.CV 首页 · 🏠 返回主页