cs.CV（2024-12-23）

📊 共 16 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ChatGarment: Garment Estimation, Generation and Editing via Large Language Models	ChatGarment：利用大型语言模型实现服装的估计、生成和编辑	large language model multimodal
2	A Multimodal Fusion Framework for Bridge Defect Detection with Cross-Verification	提出一种多模态融合框架，用于桥梁缺陷检测与交叉验证。	multimodal
3	Reasoning to Attend: Try to Understand How <SEG> Token Works	提出READ框架，通过语义相似性引导LMMs关注目标区域，提升视觉定位能力。	large language model multimodal visual grounding	✅
4	EPE-P: Evidence-based Parameter-efficient Prompting for Multimodal Learning with Missing Modalities	提出EPE-P，解决多模态学习中缺失模态问题，提升参数效率和模型性能。	multimodal	✅
5	SCBench: A Sports Commentary Benchmark for Video LLMs	提出SCBench：一个用于评估视频大语言模型在体育赛事解说生成任务上的基准。	large language model chain-of-thought
6	HumanVBench: Exploring Human-Centric Video Understanding Capabilities of MLLMs with Synthetic Benchmark Data	HumanVBench：提出用于评估MLLM类人视频理解能力的合成基准数据集。	large language model multimodal
7	S-INF: Towards Realistic Indoor Scene Synthesis via Scene Implicit Neural Field	提出S-INF以解决室内场景合成中的多模态关系问题	multimodal
8	WildPPG: A Real-World PPG Dataset of Long Continuous Recordings	WildPPG：发布长时连续PPG数据集，并提出更鲁棒的真实场景心率估计方法	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
9	CoSurfGS:Collaborative 3D Surface Gaussian Splatting with Distributed Learning for Large Scene Reconstruction	提出CoSurfGS，一种基于分布式学习的大场景协同3D表面高斯溅射重建方法	3D gaussian splatting 3DGS gaussian splatting	✅
10	V$^2$-SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy	V$^2$-SfMLearner：融合振动信号的单目无线胶囊内窥镜深度与运动估计学习	monocular depth scene reconstruction multimodal
11	LangSurf: Language-Embedded Surface Gaussians for 3D Scene Understanding	LangSurf：用于3D场景理解的语言嵌入表面高斯表示	gaussian splatting splatting scene understanding
12	GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance	GaussianPainter：提出一种基于法线引导的单次前向方法，将点云绘制为3D高斯模型。	3D gaussian splatting gaussian splatting splatting
13	Reconstructing People, Places, and Cameras	提出HSfM，联合重建多视角图像中的人体网格、场景点云和相机参数。	scene reconstruction spatial relationship	✅
14	Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object Detection	Prova：一种用于大规模词汇目标检测的简单有效的多模态原型分类器	open-vocabulary open vocabulary

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning	提出 CMT-MAE，通过协同掩码和目标提升掩码自编码器性能	representation learning masked autoencoder MAE

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects	GausSim：基于高斯核模拟弹性物体动态行为，预测真实世界	physically plausible

⬅️ 返回 cs.CV 首页 · 🏠 返回主页