cs.CV(2024-12-30)
📊 共 23 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱一:机器人控制 (Robot Control) (3 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗1)
支柱四:生成式动作 (Generative Motion) (2)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | KeyGS: A Keyframe-Centric Gaussian Splatting Method for Monocular Image Sequences | 提出KeyGS以解决单目图像序列中的3D重建效率问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 10 | 4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives | 提出基于原生4D高斯的动态场景建模方法,实现高分辨率动态场景的实时渲染。 | gaussian splatting splatting scene understanding | ||
| 11 | YOLO-UniOW: Efficient Universal Open-World Object Detection | YOLO-UniOW:高效通用开放世界目标检测模型,解决传统目标检测的局限性。 | open-vocabulary open vocabulary multimodal | ✅ | |
| 12 | FPGA-based Acceleration of Neural Network for Image Classification using Vitis AI | 利用Vitis AI在FPGA上加速图像分类神经网络,提升吞吐量和能效。 | depth estimation |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | ReFlow6D: Refraction-Guided Transparent Object 6D Pose Estimation via Intermediate Representation Learning | ReFlow6D:利用折射引导的中间表示学习实现透明物体6D位姿估计 | manipulation representation learning 6D pose estimation | ✅ | |
| 14 | PERSE: Personalized 3D Generative Avatars from A Single Portrait | PERSE:基于单张人像生成个性化3D可控头像,实现面部属性解耦编辑 | manipulation 3D gaussian splatting gaussian splatting | ||
| 15 | Edicho: Consistent Image Editing in the Wild | Edicho:基于显式图像对应关系的diffusion模型,实现野外图像一致性编辑 | manipulation classifier-free guidance |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Hierarchical Banzhaf Interaction for General Video-Language Representation Learning | 提出层级Banzhaf交互模型,用于增强通用视频-语言表征学习中的细粒度语义交互。 | representation learning contrastive learning multimodal | ||
| 17 | VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation | 提出VisionReward框架以解决视觉生成中的人类偏好对齐问题 | reinforcement learning preference learning | ✅ | |
| 18 | ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation | 提出ILDiff,通过隐式布局蒸馏生成高质量透明动画贴纸 | distillation |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | LS-GAN: Human Motion Synthesis with Latent-space GANs | LS-GAN:利用潜在空间GAN进行高效的人体动作合成 | motion synthesis | ||
| 20 | Diffgrasp: Whole-Body Grasping Synthesis Guided by Object Motion Using a Diffusion Model | Diffgrasp:利用扩散模型和物体运动引导的全身抓取合成 | contact-aware human-object interaction |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model | Vinci:基于第一视角视觉-语言模型的实时具身智能助手 | egocentric egocentric vision | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | Slow Perception: Let's Perceive Geometric Figures Step-by-step | 提出“慢感知”策略,提升LVLM在几何图形理解和复制上的能力 | spatial relationship |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | LTX-Video: Realtime Video Latent Diffusion | LTX-Video:一种用于实时视频生成的基于Transformer的潜在扩散模型 | spatiotemporal |