cs.CV（2024-04-02）

📊 共 28 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (9 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (6) 支柱一：机器人控制 (Robot Control) (3 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1 🔗1) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT	提出IISAN以解决多模态推荐系统的GPU内存和训练速度问题	representation learning foundation model multimodal	✅
2	ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models	提出ContrastCAD以解决CAD模型表示学习中的挑战	representation learning contrastive learning	✅
3	TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation	提出TSCM模型以解决视觉位置识别中的计算资源消耗问题	teacher-student distillation	✅
4	DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning	提出DELAN框架以解决视觉与语言导航中的跨模态对齐问题	contrastive learning VLN
5	Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model	提出Samba以解决高分辨率遥感图像语义分割问题	Mamba SSM state space model	✅
6	A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification	提出通用知识嵌入对比学习框架以解决高光谱图像分类问题	contrastive learning HSI	✅
7	Task Integration Distillation for Object Detectors	提出任务集成蒸馏方法以提升目标检测性能	distillation
8	CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement	提出CHOSEN以解决多视角深度精炼问题	contrastive learning depth estimation
9	Towards Robust 3D Pose Transfer with Adversarial Learning	提出对抗学习方法以增强3D姿态转移的鲁棒性	masked autoencoder MAE

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
10	FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls	提出FashionEngine以实现交互式3D人类生成与编辑	multimodal	✅
11	mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning	提出mChartQA以解决多模态图表问答中的复杂挑战	multimodal
12	Unleash the Potential of CLIP for Video Highlight Detection	提出Highlight-CLIP以解决视频高亮检测问题	large language model multimodal
13	Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation	提出层次化神经辐射表示以解决视觉语言导航中的环境预测问题	VLN
14	Minimize Quantization Output Error with Bias Compensation	提出偏差补偿方法以解决量化输出误差问题	large language model	✅
15	T-VSL: Text-Guided Visual Sound Source Localization in Mixtures	提出T-VSL以解决多源混合音源定位问题	zero-shot transfer	✅
16	Precise and Robust Sidewalk Detection: Leveraging Ensemble Learning to Surpass LLM Limitations in Urban Environments	提出集成学习模型以提升城市环境中的人行道检测精度	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
17	GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views	提出GS2Mesh以解决高斯点云表面重建问题	3D gaussian splatting 3DGS gaussian splatting
18	Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields	提出Alpha不变性以解决神经辐射场中的体积密度缩放问题	NeRF neural radiance field
19	NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation	提出NeRFCodec以解决NeRF压缩效率低的问题	NeRF neural radiance field
20	OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment	提出OFMPNet以解决城市环境中的占用与流动预测问题	occupancy grid motion prediction
21	Segment Any 3D Object with Language	提出SOLE以解决开放词汇3D实例分割问题	open-vocabulary open vocabulary multimodal
22	ViTamin: Designing Scalable Vision Models in the Vision-Language Era	提出ViTamin以提升视觉语言模型的性能与可扩展性	open-vocabulary open vocabulary

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
23	MotionChain: Conversational Motion Controllers via Multimodal Prompts	提出MotionChain以解决人类运动生成的对话控制问题	humanoid humanoid robot motion generation
24	Learning to Control Camera Exposure via Reinforcement Learning	提出基于深度强化学习的相机曝光控制框架以应对动态光照问题	domain randomization reinforcement learning deep reinforcement learning
25	EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis	提出EDTalk以解决情感化人脸合成中的特征解耦问题	manipulation	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
26	Disentangled Pre-training for Human-Object Interaction Detection	提出高效的解耦预训练方法以提升人机交互检测性能	human-object interaction HOI	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	PREGO: online mistake detection in PRocedural EGOcentric videos	提出PREGO以解决在线程序性错误检测问题	egocentric

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
28	Leveraging Digital Perceptual Technologies for Remote Perception and Analysis of Human Biomechanical Processes: A Contactless Approach for Workload and Joint Force Assessment	提出无接触方法以评估人类生物力学过程	human motion

⬅️ 返回 cs.CV 首页 · 🏠 返回主页