cs.CV(2024-08-12)
📊 共 19 篇论文 | 🔗 6 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (6)
支柱二:RL算法与架构 (RL & Architecture) (6 🔗3)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (2 🔗1)
支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | 3D Reconstruction of Protein Structures from Multi-view AFM Images using Neural Radiance Fields (NeRFs) | 利用神经辐射场和多视角原子力显微镜图像重建蛋白质复合物3D结构 | NeRF neural radiance field height map | ||
| 2 | Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces | 提出非朗伯表面区域引导的单目深度估计方法,提升透明/镜面场景的鲁棒性。 | depth estimation monocular depth Depth Anything | ||
| 3 | FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework | 提出FruitNeRF框架以实现3D水果计数 | neural radiance field optical flow foundation model | ||
| 4 | Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering | Mipmap-GS:通过尺度特定Mipmap形变高斯分布实现抗锯齿渲染 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 5 | HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors | 提出HeadGAP以解决少样本3D头像生成问题 | gaussian splatting splatting | ||
| 6 | HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors | 提出HeLiMOS数据集,用于评估异构LiDAR传感器在移动物体分割任务中的性能。 | scene understanding |
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | Efficient Visual Representation Learning with Heat Conduction Equation | 提出基于热传导方程的视觉表征学习框架HcNet,实现高效图像特征提取 | representation learning foundation model | ✅ | |
| 8 | SkillMimic: Learning Basketball Interaction Skills from Demonstrations | SkillMimic:从演示中学习篮球交互技能,无需人工设计奖励函数 | reinforcement learning human-object interaction HOI | ✅ | |
| 9 | Boosting Adverse Weather Crowd Counting via Multi-queue Contrastive Learning | 提出基于多队列对比学习的MQCL模型,提升恶劣天气下人群计数精度。 | representation learning contrastive learning | ||
| 10 | Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation | MT3D:利用深度几何矩提升文本到3D生成中的形状一致性 | distillation geometric consistency | ||
| 11 | DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation | DEEPTalk:提出动态情感嵌入,用于概率语音驱动的3D人脸动画生成 | contrastive learning VQ-VAE | ✅ | |
| 12 | Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes | 提出多尺度对比自适应学习MCA-SAM,提升SAM在欠佳场景下的分割性能。 | MAE contrastive learning |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers | ARPA:一种融合LLM与Transformer的混合模型,用于提升视觉词义消歧性能 | large language model multimodal | ||
| 14 | Learning Collaborative Knowledge with Multimodal Representation for Polyp Re-Identification | 提出基于多模态协作学习的息肉Re-ID框架,提升结直肠癌辅助诊断性能 | multimodal | ✅ | |
| 15 | GlyphPattern: An Abstract Pattern Recognition Benchmark for Vision-Language Models | GlyphPattern:一个用于评估视觉-语言模型抽象模式识别能力的新基准 | large language model |
🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection | HOIGen:首个基于CLIP生成模型的零样本HOI检测方法,缓解了seen-unseen混淆问题。 | human-object interaction HOI | ✅ | |
| 17 | Blind-Match: Efficient Homomorphic Encryption-Based 1:N Matching for Privacy-Preserving Biometric Identification | Blind-Match:基于同态加密的高效隐私保护生物特征1:N匹配 | OMOMO |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization | 提出历史增强Anchor Transformer(HAT)框架,用于提升在线时序动作定位性能。 | egocentric | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Freehand Sketch Generation from Mechanical Components | 提出MSFormer以解决机械组件自由手绘草图生成问题 | humanoid |