cs.CV(2025-04-12)

📊 共 16 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (6) 支柱二:RL算法与架构 (RL & Architecture) (2) 支柱一:机器人控制 (Robot Control) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting 提出TPGS框架以解决全景3D重建中的投影失真问题 3D gaussian splatting 3DGS gaussian splatting
2 A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds 提出一种基于约束优化的高斯溅射方法,用于从粗略位姿图像和噪声激光雷达点云中重建场景。 3D gaussian splatting 3DGS gaussian splatting
3 BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting BlockGaussian:通过自适应块高斯喷溅实现高效的大规模场景新视角合成 3D gaussian splatting 3DGS gaussian splatting
4 AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images AerOSeg:利用SAM进行遥感图像的开放词汇分割 open-vocabulary open vocabulary
5 Text To 3D Object Generation For Scalable Room Assembly 提出一种基于文本到3D对象生成的可扩展房间组装系统,用于合成数据生成。 depth estimation neural radiance field scene understanding
6 SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow SCFlow2:基于形状约束场景流的即插即用物体姿态优化器 scene flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
7 SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification SDIGLM:利用大型语言模型和多模态思维链进行结构损伤识别 large language model chain-of-thought
8 REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis REMEMBER:一种基于检索、可解释的多模态证据引导模型,用于零样本和少样本神经退行性疾病诊断。 multimodal
9 DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models 提出权重分解低秩量化感知训练(DL-QAT),高效量化大型语言模型。 large language model
10 seg2med: a bridge from artificial anatomy to multimodal medical images Seg2Med:构建人工解剖学到多模态医学影像的桥梁 multimodal
11 VideoAds for Fast-Paced Video Understanding VideoAds:用于快节奏视频理解的多模态大语言模型基准数据集 large language model
12 FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment 提出大规模人脸视频质量评估数据集FVQ-20K及基于LMM的评估方法FVQ-Rater multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
13 PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks PathVLM-R1:基于强化学习的病理视觉语言推理模型,提升诊断准确性和泛化性 reinforcement learning multimodal
14 UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt Guidance 提出UniFlowRestore,通过流匹配和提示引导实现通用视频修复框架 flow matching

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
15 BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting BIGS:基于单目视频和3D高斯溅射的双手无类别交互重建 bi-manual distillation 3D gaussian splatting

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
16 Using Vision Language Models for Safety Hazard Identification in Construction 提出基于视觉语言模型的建筑工地安全隐患识别框架,提升情境感知能力。 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页