cs.CV(2024-08-12)

📊 共 19 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (2 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 3D Reconstruction of Protein Structures from Multi-view AFM Images using Neural Radiance Fields (NeRFs) 利用神经辐射场和多视角原子力显微镜图像重建蛋白质复合物3D结构 NeRF neural radiance field height map
2 Towards Robust Monocular Depth Estimation in Non-Lambertian Surfaces 提出非朗伯表面区域引导的单目深度估计方法,提升透明/镜面场景的鲁棒性。 depth estimation monocular depth Depth Anything
3 FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework 提出FruitNeRF框架以实现3D水果计数 neural radiance field optical flow foundation model
4 Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering Mipmap-GS:通过尺度特定Mipmap形变高斯分布实现抗锯齿渲染 3D gaussian splatting 3DGS gaussian splatting
5 HeadGAP: Few-Shot 3D Head Avatar via Generalizable Gaussian Priors 提出HeadGAP以解决少样本3D头像生成问题 gaussian splatting splatting
6 HeLiMOS: A Dataset for Moving Object Segmentation in 3D Point Clouds From Heterogeneous LiDAR Sensors 提出HeLiMOS数据集,用于评估异构LiDAR传感器在移动物体分割任务中的性能。 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
7 Efficient Visual Representation Learning with Heat Conduction Equation 提出基于热传导方程的视觉表征学习框架HcNet,实现高效图像特征提取 representation learning foundation model
8 SkillMimic: Learning Basketball Interaction Skills from Demonstrations SkillMimic:从演示中学习篮球交互技能,无需人工设计奖励函数 reinforcement learning human-object interaction HOI
9 Boosting Adverse Weather Crowd Counting via Multi-queue Contrastive Learning 提出基于多队列对比学习的MQCL模型,提升恶劣天气下人群计数精度。 representation learning contrastive learning
10 Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation MT3D:利用深度几何矩提升文本到3D生成中的形状一致性 distillation geometric consistency
11 DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation DEEPTalk:提出动态情感嵌入,用于概率语音驱动的3D人脸动画生成 contrastive learning VQ-VAE
12 Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes 提出多尺度对比自适应学习MCA-SAM,提升SAM在欠佳场景下的分割性能。 MAE contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
13 ARPA: A Novel Hybrid Model for Advancing Visual Word Disambiguation Using Large Language Models and Transformers ARPA:一种融合LLM与Transformer的混合模型,用于提升视觉词义消歧性能 large language model multimodal
14 Learning Collaborative Knowledge with Multimodal Representation for Polyp Re-Identification 提出基于多模态协作学习的息肉Re-ID框架,提升结直肠癌辅助诊断性能 multimodal
15 GlyphPattern: An Abstract Pattern Recognition Benchmark for Vision-Language Models GlyphPattern:一个用于评估视觉-语言模型抽象模式识别能力的新基准 large language model

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
16 Unseen No More: Unlocking the Potential of CLIP for Generative Zero-shot HOI Detection HOIGen:首个基于CLIP生成模型的零样本HOI检测方法,缓解了seen-unseen混淆问题。 human-object interaction HOI
17 Blind-Match: Efficient Homomorphic Encryption-Based 1:N Matching for Privacy-Preserving Biometric Identification Blind-Match:基于同态加密的高效隐私保护生物特征1:N匹配 OMOMO

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
18 HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization 提出历史增强Anchor Transformer(HAT)框架,用于提升在线时序动作定位性能。 egocentric

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
19 Freehand Sketch Generation from Mechanical Components 提出MSFormer以解决机械组件自由手绘草图生成问题 humanoid

⬅️ 返回 cs.CV 首页 · 🏠 返回主页