cs.CV（2024-04-03）

📊 共 19 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (7 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (6) 支柱一：机器人控制 (Robot Control) (4 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (2 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ALOHa: A New Measure for Hallucination in Captioning Models	提出ALOHa以解决视觉描述模型中的幻觉问题	open-vocabulary open vocabulary large language model	✅
2	TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving	提出TCLC-GS以解决LiDAR与相机数据融合不足的问题	3D gaussian splatting 3D reconstruction gaussian splatting
3	Neural Radiance Fields with Torch Units	提出Torch-NeRF以解决复杂场景重建问题	3D reconstruction NeRF neural radiance field
4	Behind the Veil: Enhanced Indoor 3D Scene Reconstruction with Occluded Surfaces Completion	提出一种新方法以解决室内3D重建中的遮挡表面补全问题	3D reconstruction scene reconstruction
5	Freditor: High-Fidelity and Transferable NeRF Editing by Frequency Decomposition	提出频率分解方法以实现高保真且可转移的NeRF编辑	NeRF	✅
6	LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis	提出LiDAR4D以解决动态LiDAR视图合成问题	NeRF neural radiance field	✅
7	APC2Mesh: Bridging the gap from occluded building façades to full 3D models	提出APC2Mesh以解决建筑外立面遮挡问题	3D reconstruction

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
8	SalFoM: Dynamic Saliency Prediction with Video Foundation Models	提出SalFoM以解决视频显著性预测中的动态建模问题	foundation model
9	VIAssist: Adapting Multi-modal Large Language Models for Users with Visual Impairments	提出VIAssist以帮助视觉障碍者利用多模态大语言模型	large language model
10	Cohort-Individual Cooperative Learning for Multimodal Cancer Survival Analysis	提出CCL框架以解决多模态癌症生存分析中的信息异质性问题	multimodal
11	LVLM-Interpret: An Interpretability Tool for Large Vision-Language Models	提出LVLM-Interpret以增强大规模视觉语言模型的可解释性	large language model
12	DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement	提出DIBS框架以提升密集视频字幕生成质量	large language model
13	Diffexplainer: Towards Cross-modal Global Explanations with Diffusion Models	提出DiffExplainer以实现跨模态全局解释	multimodal

🔬 支柱一：机器人控制 (Robot Control) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Text-driven Affordance Learning from Egocentric Vision	提出文本驱动的可供性学习方法以解决机器人交互问题	manipulation affordance egocentric
15	AWOL: Analysis WithOut synthesis using Language	提出语言驱动的3D形状生成方法以解决建模难题	quadruped
16	Deep Image Composition Meets Image Forgery	提出自动化数据生成方法以解决图像伪造检测数据不足问题	manipulation	✅
17	3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization	提出3DStyleGLIP以解决3D对象细节风格化问题	manipulation	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
18	RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation	提出RS3Mamba以解决遥感图像语义分割中的长程建模问题	Mamba state space model	✅
19	DeiT-LT Distillation Strikes Back for Vision Transformer Training on Long-Tailed Datasets	提出DeiT-LT以解决长尾数据集上ViT训练问题	distillation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页