cs.CV(2024-04-02)
📊 共 28 篇论文 | 🔗 10 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (9 🔗5)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (6)
支柱一:机器人控制 (Robot Control) (3 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱七:动作重定向 (Motion Retargeting) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls | 提出FashionEngine以实现交互式3D人类生成与编辑 | multimodal | ✅ | |
| 11 | mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning | 提出mChartQA以解决多模态图表问答中的复杂挑战 | multimodal | ||
| 12 | Unleash the Potential of CLIP for Video Highlight Detection | 提出Highlight-CLIP以解决视频高亮检测问题 | large language model multimodal | ||
| 13 | Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation | 提出层次化神经辐射表示以解决视觉语言导航中的环境预测问题 | VLN | ||
| 14 | Minimize Quantization Output Error with Bias Compensation | 提出偏差补偿方法以解决量化输出误差问题 | large language model | ✅ | |
| 15 | T-VSL: Text-Guided Visual Sound Source Localization in Mixtures | 提出T-VSL以解决多源混合音源定位问题 | zero-shot transfer | ✅ | |
| 16 | Precise and Robust Sidewalk Detection: Leveraging Ensemble Learning to Surpass LLM Limitations in Urban Environments | 提出集成学习模型以提升城市环境中的人行道检测精度 | large language model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views | 提出GS2Mesh以解决高斯点云表面重建问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 18 | Alpha Invariance: On Inverse Scaling Between Distance and Volume Density in Neural Radiance Fields | 提出Alpha不变性以解决神经辐射场中的体积密度缩放问题 | NeRF neural radiance field | ||
| 19 | NeRFCodec: Neural Feature Compression Meets Neural Radiance Fields for Memory-Efficient Scene Representation | 提出NeRFCodec以解决NeRF压缩效率低的问题 | NeRF neural radiance field | ||
| 20 | OFMPNet: Deep End-to-End Model for Occupancy and Flow Prediction in Urban Environment | 提出OFMPNet以解决城市环境中的占用与流动预测问题 | occupancy grid motion prediction | ||
| 21 | Segment Any 3D Object with Language | 提出SOLE以解决开放词汇3D实例分割问题 | open-vocabulary open vocabulary multimodal | ||
| 22 | ViTamin: Designing Scalable Vision Models in the Vision-Language Era | 提出ViTamin以提升视觉语言模型的性能与可扩展性 | open-vocabulary open vocabulary |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | MotionChain: Conversational Motion Controllers via Multimodal Prompts | 提出MotionChain以解决人类运动生成的对话控制问题 | humanoid humanoid robot motion generation | ||
| 24 | Learning to Control Camera Exposure via Reinforcement Learning | 提出基于深度强化学习的相机曝光控制框架以应对动态光照问题 | domain randomization reinforcement learning deep reinforcement learning | ||
| 25 | EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis | 提出EDTalk以解决情感化人脸合成中的特征解耦问题 | manipulation | ✅ |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | Disentangled Pre-training for Human-Object Interaction Detection | 提出高效的解耦预训练方法以提升人机交互检测性能 | human-object interaction HOI | ✅ |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | PREGO: online mistake detection in PRocedural EGOcentric videos | 提出PREGO以解决在线程序性错误检测问题 | egocentric |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | Leveraging Digital Perceptual Technologies for Remote Perception and Analysis of Human Biomechanical Processes: A Contactless Approach for Workload and Joint Force Assessment | 提出无接触方法以评估人类生物力学过程 | human motion |