cs.CV(2024-04-02)

📊 共 28 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (9 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (6) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
1 IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT 提出IISAN以解决多模态推荐系统的GPU内存和训练速度问题 representation learning foundation model multimodal
2 ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models 提出ContrastCAD以解决CAD模型表示学习中的挑战 representation learning contrastive learning
3 TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation 提出TSCM模型以解决视觉位置识别中的计算资源消耗问题 teacher-student distillation
4 DELAN: Dual-Level Alignment for Vision-and-Language Navigation by Cross-Modal Contrastive Learning 提出DELAN框架以解决视觉与语言导航中的跨模态对齐问题 contrastive learning VLN
5 Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model 提出Samba以解决高分辨率遥感图像语义分割问题 Mamba SSM state space model
6 A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification 提出通用知识嵌入对比学习框架以解决高光谱图像分类问题 contrastive learning HSI
7 Task Integration Distillation for Object Detectors 提出任务集成蒸馏方法以提升目标检测性能 distillation
8 CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement 提出CHOSEN以解决多视角深度精炼问题 contrastive learning depth estimation
9 Towards Robust 3D Pose Transfer with Adversarial Learning 提出对抗学习方法以增强3D姿态转移的鲁棒性 masked autoencoder MAE

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
10 FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls 提出FashionEngine以实现交互式3D人类生成与编辑 multimodal
11 mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning 提出mChartQA以解决多模态图表问答中的复杂挑战 multimodal
12 Unleash the Potential of CLIP for Video Highlight Detection 提出Highlight-CLIP以解决视频高亮检测问题 large language model multimodal
13 Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation 提出层次化神经辐射表示以解决视觉语言导航中的环境预测问题 VLN
14 Minimize Quantization Output Error with Bias Compensation 提出偏差补偿方法以解决量化输出误差问题 large language model
15 T-VSL: Text-Guided Visual Sound Source Localization in Mixtures 提出T-VSL以解决多源混合音源定位问题 zero-shot transfer
16 Precise and Robust Sidewalk Detection: Leveraging Ensemble Learning to Surpass LLM Limitations in Urban Environments 提出集成学习模型以提升城市环境中的人行道检测精度 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
17 GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views 提出GS2Mesh以解决高斯点云表面重建问题 3D gaussian splatting 3DGS gaussian splatting
18 Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields 提出Alpha不变性以解决神经辐射场中的体积密度缩放问题 NeRF neural radiance field
19 NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation 提出NeRFCodec以解决NeRF压缩效率低的问题 NeRF neural radiance field
20 OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment 提出OFMPNet以解决城市环境中的占用与流动预测问题 occupancy grid motion prediction
21 Segment Any 3D Object with Language 提出SOLE以解决开放词汇3D实例分割问题 open-vocabulary open vocabulary multimodal
22 ViTamin: Designing Scalable Vision Models in the Vision-Language Era 提出ViTamin以提升视觉语言模型的性能与可扩展性 open-vocabulary open vocabulary

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
23 MotionChain: Conversational Motion Controllers via Multimodal Prompts 提出MotionChain以解决人类运动生成的对话控制问题 humanoid humanoid robot motion generation
24 Learning to Control Camera Exposure via Reinforcement Learning 提出基于深度强化学习的相机曝光控制框架以应对动态光照问题 domain randomization reinforcement learning deep reinforcement learning
25 EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis 提出EDTalk以解决情感化人脸合成中的特征解耦问题 manipulation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
26 Disentangled Pre-training for Human-Object Interaction Detection 提出高效的解耦预训练方法以提升人机交互检测性能 human-object interaction HOI

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
27 PREGO: online mistake detection in PRocedural EGOcentric videos 提出PREGO以解决在线程序性错误检测问题 egocentric

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
28 Leveraging Digital Perceptual Technologies for Remote Perception and Analysis of Human Biomechanical Processes: A Contactless Approach for Workload and Joint Force Assessment 提出无接触方法以评估人类生物力学过程 human motion

⬅️ 返回 cs.CV 首页 · 🏠 返回主页