cs.CV（2024-04-04）

📊 共 36 篇论文 | 🔗 18 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (15 🔗8) 支柱九：具身大模型 (Embodied Foundation Models) (11 🔗5) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗2) 支柱七：动作重定向 (Motion Retargeting) (2 🔗1) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (15 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting	提出基于每个高斯嵌入的变形方法以解决动态场景重建问题	3D gaussian splatting 3DGS gaussian splatting	✅
2	OmniGS: Fast Radiance Field Reconstruction using Omnidirectional Gaussian Splatting	提出OmniGS以解决传统3D高斯点云重建的局限性	3D gaussian splatting gaussian splatting splatting
3	Adaptive Discrete Disparity Volume for Self-supervised Monocular Depth Estimation	提出自适应离散视差体积以解决单目深度估计问题	depth estimation monocular depth
4	WorDepth: Variational Language Prior for Monocular Depth Estimation	提出WorDepth以解决单目深度估计中的模糊性问题	depth estimation monocular depth
5	GaSpCT: Gaussian Splatting for Novel CT Projection View Synthesis	提出GaSpCT以解决CT扫描视图合成问题	gaussian splatting splatting
6	Gen3DSR: Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View	提出Gen3DSR以解决单视图3D重建的复杂场景问题	3D reconstruction scene reconstruction	✅
7	VF-NeRF: Viewshed Fields for Rigid NeRF Registration	提出VF-NeRF以解决NeRF的刚性配准问题	NeRF neural radiance field
8	Is CLIP the main roadblock for fine-grained open-world perception?	提出改进CLIP以解决细粒度开放世界感知问题	open-vocabulary open vocabulary multimodal	✅
9	RaFE: Generative Radiance Fields Restoration	提出RaFE以解决NeRF在低质量输入下的恢复问题	3D reconstruction NeRF neural radiance field	✅
10	OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views	提出OpenNeRF以解决开放集3D场景分割问题	NeRF open-vocabulary open vocabulary
11	Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning	提出KYN以解决单视图重建中的空间视觉语言推理问题	depth estimation scene reconstruction	✅
12	Learning Transferable Negative Prompts for Out-of-Distribution Detection	提出NegPrompt以解决OOD检测中的假阳性问题	open-vocabulary open vocabulary	✅
13	LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity	提出LeGrad以解决视觉变换器的可解释性问题	open-vocabulary open vocabulary	✅
14	The More You See in 2D, the More You Perceive in 3D	提出SAP3D以解决无姿态图像的3D重建问题	3D reconstruction
15	MonoCD: Monocular 3D Object Detection with Complementary Depths	提出MonoCD以解决单目3D目标检测中的深度估计问题	depth estimation	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
16	MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens	提出MiniGPT4-Video以解决视频理解中的多模态挑战	large language model multimodal	✅
17	PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model	提出基于推理的3D部件分割方法以解决现有系统的局限性	multimodal	✅
18	No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance	提出多模态模型数据需求分析以解决零样本泛化问题	multimodal
19	TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices	提出TinyVQA以解决资源受限设备上的视觉问答问题	multimodal
20	Scaling Up Video Summarization Pretraining with Large Language Models	提出基于大语言模型的视频摘要生成方法以解决数据集不足问题	large language model
21	LongVLM: Efficient Long Video Understanding via Large Language Models	提出LongVLM以解决长视频理解中的局部信息缺失问题	large language model	✅
22	OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioning	提出OW-VISCapTor以解决开放世界视频实例分割与描述问题	large language model foundation model
23	SemGrasp: Semantic Grasp Generation via Language Aligned Discretization	提出SemGrasp以解决语义信息不足导致的抓取生成问题	large language model multimodal	✅
24	Test Time Training for Industrial Anomaly Segmentation	提出测试时训练策略以解决工业异常分割问题	multimodal
25	HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion	提出HAPNet以解决RGB-热成像场景解析中的特征融合问题	foundation model
26	iSeg: Interactive 3D Segmentation via Interactive Attention	提出iSeg以解决3D形状交互式分割问题	foundation model	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
27	InsectMamba: Insect Pest Classification with State Space Model	提出InsectMamba以解决昆虫害虫分类问题	Mamba SSM state space model
28	SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer	提出SC4D框架以解决视频到4D生成中的运动与外观解耦问题	distillation NeRF motion prediction
29	SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation	提出SDPose以解决小型变换器模型性能不足问题	distillation	✅
30	Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning	提出稀疏概念瓶颈模型以提升可解释分类性能	contrastive learning	✅
31	FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification	提出FACTUAL框架以解决SAR图像分类的对抗性鲁棒性问题	contrastive learning

🔬 支柱七：动作重定向 (Motion Retargeting) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
32	Towards more realistic human motion prediction with attention to motion coordination	提出协调吸引子以解决人类运动预测中的协调性问题	human motion human motion prediction motion prediction
33	Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture	提出SeNeVA以解决自主车辆运动预测中的不确定性问题	motion prediction	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
34	You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects	提出动态场景重建管道以解决6自由度机器人抓取问题	manipulation scene reconstruction scene understanding
35	BioVL-QR: Egocentric Biochemical Vision-and-Language Dataset Using Micro QR Codes	提出BioVL-QR数据集以解决生化视频理解难题	manipulation egocentric	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
36	AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales	提出AGL-NET以解决多模态全局定位中的尺度差异问题	feature matching	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页