cs.CV(2025-01-21)
📊 共 25 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (9 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1)
支柱二:RL算法与架构 (RL & Architecture) (7 🔗2)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | HAC++: Towards 100X Compression of 3D Gaussian Splatting | HAC++:实现3D高斯溅射100倍压缩,提升渲染保真度 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 2 | Survey on Monocular Metric Depth Estimation | 提出单目度量深度估计以解决深度预测一致性问题 | visual SLAM depth estimation monocular depth | ||
| 3 | Video Depth Anything: Consistent Depth Estimation for Super-Long Videos | Video Depth Anything:为超长视频提供一致性深度估计 | depth estimation monocular depth Depth Anything | ||
| 4 | Towards Affordance-Aware Articulation Synthesis for Rigged Objects | 提出A3Syn,解决开放域绑定物体的具身姿态自动合成问题 | affordance affordance-aware | ||
| 5 | GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting | 提出GSVC以通过2D高斯点云高效表示和压缩视频 | gaussian splatting splatting | ||
| 6 | Fast Underwater Scene Reconstruction using Multi-View Stereo and Physical Imaging | 提出基于物理成像的水下多视图立体快速重建方法,提升重建质量与效率。 | depth estimation NeRF neural radiance field | ||
| 7 | DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions | 提出基于衰减各向异性径向基函数(DARB)的Splatting方法,加速训练并降低内存消耗。 | 3D gaussian splatting gaussian splatting splatting | ||
| 8 | Learning segmentation from point trajectories | 利用点轨迹学习视频分割,无需额外监督信息。 | optical flow | ||
| 9 | Continuous 3D Perception Model with Persistent State | 提出CUT3R,利用持续状态的循环模型解决连续3D感知任务。 | scene reconstruction |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | High-dimensional multimodal uncertainty estimation by manifold alignment:Application to 3D right ventricular strain computations | 提出基于流形对齐的高维多模态不确定性估计方法,应用于三维右心室应变计算。 | representation learning multimodal | ||
| 18 | InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling | InternVideo2.5通过长程和丰富上下文建模增强视频多模态大语言模型 | direct preference optimization spatiotemporal large language model | ✅ | |
| 19 | Contrastive Masked Autoencoders for Character-Level Open-Set Writer Identification | 提出CMAE模型,解决字符级开放集手写者身份识别问题 | representation learning masked autoencoder MAE | ||
| 20 | Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos | 提出Memory Storyboard,利用时序分割进行第一视角视频流的自监督学习 | contrastive learning egocentric | ||
| 21 | DNRSelect: Active Best View Selection for Deferred Neural Rendering | DNRSelect:用于延迟神经渲染的主动最佳视角选择方法 | reinforcement learning NeRF geometric consistency | ||
| 22 | SMamba: Sparse Mamba for Event-based Object Detection | 提出SMamba:一种稀疏Mamba架构,用于提升事件相机目标检测的效率与精度。 | Mamba spatiotemporal | ||
| 23 | InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model | 提出InternLM-XComposer2.5-Reward,一个简单高效的多模态奖励模型,用于提升LVLM的生成质量。 | reinforcement learning PPO instruction following | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | Cinepro: Robust Training of Foundation Models for Cancer Detection in Prostate Ultrasound Cineloops | Cinepro:通过稳健训练提升前列腺超声电影环中癌症检测的基础模型性能 | spatial relationship foundation model |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | Regressor-Guided Generative Image Editing Balances User Emotions to Reduce Time Spent Online | 提出Regressor引导的生成图像编辑,平衡用户情绪以减少上网时间 | classifier-free guidance |