cs.CV(2024-09-11)
📊 共 23 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2)
支柱一:机器人控制 (Robot Control) (4)
支柱六:视频提取与匹配 (Video Extraction) (2)
支柱二:RL算法与架构 (RL & Architecture) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | ThermalGaussian: Thermal 3D Gaussian Splatting | 提出ThermalGaussian,实现RGB和热成像模态下的高质量3D高斯重建与实时渲染 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 10 | Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs | 提出自进化深度监督3D高斯溅射,利用渲染立体图像对提升深度精度。 | 3D gaussian splatting gaussian splatting splatting | ||
| 11 | Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models | Hi3D:利用视频扩散模型实现高分辨率图像到3D生成 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 12 | Violence detection in videos using deep recurrent and convolutional neural networks | 提出结合RNN和CNN的深度学习架构,用于视频中的暴力行为检测。 | optical flow | ||
| 13 | DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation | DreamMesh:联合操纵和纹理化三角网格,实现高质量文本到3D生成 | NeRF | ||
| 14 | Rethinking Directional Parameterization in Neural Implicit Surface Reconstruction | 提出混合方向参数化方法,提升神经隐式表面重建对复杂材质和几何体的重建效果 | implicit representation |
🔬 支柱一:机器人控制 (Robot Control) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos | StereoCrafter:提出一种基于扩散模型的单目视频生成高质量立体3D视频方法 | Apple Vision Pro splatting foundation model | ||
| 16 | Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy | 提出基于条件StyleGAN和潜在空间操控的视网膜图像生成方法,提升糖尿病视网膜病变诊断。 | manipulation | ||
| 17 | Feature Importance in Pedestrian Intention Prediction: A Context-Aware Review | 提出上下文感知排列特征重要性(CAPFI),提升行人意图预测模型的可解释性。 | locomotion predictive model | ||
| 18 | Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks | 提出SO(2)等变高斯雕刻网络,用于单视图3D重建 | manipulation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Benchmarking 2D Egocentric Hand Pose Datasets | 针对2D自中心手势估计,提出一种新的数据集评估协议,并对现有数据集进行基准测试。 | egocentric | ||
| 20 | FaVoR: Features via Voxel Rendering for Camera Relocalization | FaVoR:利用体素渲染特征实现相机重定位,提升视角变化下的鲁棒性。 | feature matching |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | Current Symmetry Group Equivariant Convolution Frameworks for Representation Learning | 综述对称群等变卷积框架,用于解决非欧空间表示学习问题。 | representation learning | ||
| 22 | Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement | 提出Retinex-RAWMamba,桥接去马赛克和去噪,用于低光RAW图像增强。 | Mamba | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 23 | DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures | DiffTED:基于扩散模型的单样本音频驱动TED演讲视频生成,实现自然口型和丰富肢体动作 | classifier-free guidance |