cs.CV（2024-08-23）

📊 共 12 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (3) 支柱二：RL算法与架构 (RL & Architecture) (3 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱一：机器人控制 (Robot Control) (2) 支柱四：生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
1	BiGS: Bidirectional Gaussian Primitives for Relightable 3D Gaussian Splatting	提出双向高斯基元(BiGS)，实现动态光照下可重新光照的3D高斯溅射	3D gaussian splatting gaussian splatting splatting
2	SpecGaussian with Latent Features: A High-quality Modeling of the View-dependent Appearance for 3D Gaussian Splatting	提出Lantent-SpecGS，通过隐空间特征建模3D高斯光 Splatting 的视角相关外观，提升渲染质量。	3D gaussian splatting gaussian splatting splatting
3	Map-Free Visual Relocalization Enhanced by Instance Knowledge and Depth Knowledge	提出一种融合实例与深度知识的无地图视觉重定位方法	metric depth

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
4	VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models	VFM-Det：基于大规模预训练模型实现高性能车辆检测	contrastive learning large language model foundation model	✅
5	Foundational Model for Electron Micrograph Analysis: Instruction-Tuning Small-Scale Language-and-Vision Assistant for Enterprise Adoption	提出MAEMI：用于半导体电镜图像分析的小型指令调优视觉-语言基础模型	distillation multimodal instruction following
6	SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning	提出语义对抗增强(SeA)方法，提升无监督表征学习中固定深度特征的下游任务性能。	representation learning	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
7	MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?	MME-RealWorld：构建高分辨率真实世界多模态大模型评测基准	large language model multimodal
8	VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models	VALE：一种用于图像分类器的多模态视觉和语言解释框架	multimodal
9	Online Zero-Shot Classification with CLIP	提出OnZeta在线零样本分类方法，利用目标数据分布提升CLIP性能。	zero-shot transfer	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
10	ShapeICP: Iterative Category-level Object Pose and Shape Estimation from Depth	ShapeICP：基于深度图的迭代类别级物体姿态和形状估计	manipulation
11	Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing	提出任务导向的扩散反演方法以解决图像编辑精度问题	manipulation

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
12	CustomCrafter: Customized Video Generation with Preserving Motion and Concept Composition Abilities	CustomCrafter：一种无需额外视频和微调即可定制视频生成，同时保持运动和概念组合能力的新框架。	motion generation	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页