cs.CV（2025-04-12）

📊 共 16 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (6) 支柱二：RL算法与架构 (RL & Architecture) (2) 支柱一：机器人控制 (Robot Control) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	You Need a Transition Plane: Bridging Continuous Panoramic 3D Reconstruction with Perspective Gaussian Splatting	提出TPGS框架以解决全景3D重建中的投影失真问题	3D gaussian splatting 3DGS gaussian splatting	✅
2	A Constrained Optimization Approach for Gaussian Splatting from Coarsely-posed Images and Noisy Lidar Point Clouds	提出一种基于约束优化的高斯溅射方法，用于从粗略位姿图像和噪声激光雷达点云中重建场景。	3D gaussian splatting 3DGS gaussian splatting
3	BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting	BlockGaussian：通过自适应块高斯喷溅实现高效的大规模场景新视角合成	3D gaussian splatting 3DGS gaussian splatting	✅
4	AerOSeg: Harnessing SAM for Open-Vocabulary Segmentation in Remote Sensing Images	AerOSeg：利用SAM进行遥感图像的开放词汇分割	open-vocabulary open vocabulary
5	Text To 3D Object Generation For Scalable Room Assembly	提出一种基于文本到3D对象生成的可扩展房间组装系统，用于合成数据生成。	depth estimation neural radiance field scene understanding
6	SCFlow2: Plug-and-Play Object Pose Refiner with Shape-Constraint Scene Flow	SCFlow2：基于形状约束场景流的即插即用物体姿态优化器	scene flow

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
7	SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification	SDIGLM：利用大型语言模型和多模态思维链进行结构损伤识别	large language model chain-of-thought
8	REMEMBER: Retrieval-based Explainable Multimodal Evidence-guided Modeling for Brain Evaluation and Reasoning in Zero- and Few-shot Neurodegenerative Diagnosis	REMEMBER：一种基于检索、可解释的多模态证据引导模型，用于零样本和少样本神经退行性疾病诊断。	multimodal
9	DL-QAT: Weight-Decomposed Low-Rank Quantization-Aware Training for Large Language Models	提出权重分解低秩量化感知训练(DL-QAT)，高效量化大型语言模型。	large language model
10	seg2med: a bridge from artificial anatomy to multimodal medical images	Seg2Med：构建人工解剖学到多模态医学影像的桥梁	multimodal
11	VideoAds for Fast-Paced Video Understanding	VideoAds：用于快节奏视频理解的多模态大语言模型基准数据集	large language model
12	FVQ: A Large-Scale Dataset and an LMM-based Method for Face Video Quality Assessment	提出大规模人脸视频质量评估数据集FVQ-20K及基于LMM的评估方法FVQ-Rater	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
13	PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks	PathVLM-R1：基于强化学习的病理视觉语言推理模型，提升诊断准确性和泛化性	reinforcement learning multimodal
14	UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt Guidance	提出UniFlowRestore，通过流匹配和提示引导实现通用视频修复框架	flow matching

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting	BIGS：基于单目视频和3D高斯溅射的双手无类别交互重建	bi-manual distillation 3D gaussian splatting

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Using Vision Language Models for Safety Hazard Identification in Construction	提出基于视觉语言模型的建筑工地安全隐患识别框架，提升情境感知能力。	spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页