cs.CV(2024-06-28)

📊 共 18 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱七:动作重定向 (Motion Retargeting) (3 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment MM-Instruct:生成视觉指令数据,提升大型多模态模型指令遵循能力 large language model multimodal instruction following
2 Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs 提出Web2Code数据集与评估框架,提升多模态LLM网页理解与代码生成能力 large language model multimodal
3 Multimodal Prototyping for cancer survival prediction 提出基于多模态原型学习的癌症生存预测方法,显著降低计算量并提升可解释性。 multimodal
4 PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration PathGen-1.6M:通过多智能体协作生成160万病理图像-文本对,提升病理VLM性能 large language model multimodal
5 EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model 提出EVF-SAM,通过早期视觉-语言融合提升文本提示SAM的分割性能 multimodal
6 InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows InfiniBench:长视频多模态大模型评测基准,挑战电影和电视剧理解 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
7 EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting EgoGaussian:利用3D高斯溅射从第一视角视频中理解动态场景 3D gaussian splatting gaussian splatting splatting
8 SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting SpotlessSplats:利用鲁棒优化和预训练特征,消除3D高斯溅射中的干扰物 3D gaussian splatting 3DGS gaussian splatting
9 Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey 深度学习单目图像/视频深度估计方法综述:架构、监督与演进 depth estimation monocular depth
10 ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction 提出ASSR-NeRF,通过体素网格上的任意尺度超分辨率实现高质量辐射场重建 NeRF
11 LightStereo: Channel Boost Is All You Need for Efficient 2D Cost Aggregation LightStereo:通过通道增强实现高效的2D代价聚合立体匹配 scene flow

🔬 支柱七:动作重定向 (Motion Retargeting) (3 篇)

#题目一句话要点标签🔗
12 FootBots: A Transformer-based Architecture for Motion Prediction in Soccer FootBots:基于Transformer的足球运动预测架构,利用等变性提升预测精度 motion prediction
13 MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance MimicMotion:基于置信度感知姿态引导的高质量人体运动视频生成 human motion
14 Optimized 3D Point Labeling with Leaders Using the Beams Displacement Method 提出基于梁位移法的三维点要素优化标注方法,解决标签重叠和方向偏差问题。 spatial relationship

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
15 Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train 提出结构感知世界模型,通过大规模自监督预训练提升超声探头引导精度 world model spatial relationship
16 CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion 提出基于交叉自注意力知识蒸馏的CSAKD模型,用于高光谱和多光谱图像融合。 distillation HSI
17 PopAlign: Population-Level Alignment for Fair Text-to-Image Generation 提出PopAlign,解决文本到图像生成中群体层面偏见问题。 reinforcement learning RLHF DPO

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
18 SemUV: Deep Learning based semantic manipulation over UV texture map of virtual human heads SemUV:提出一种基于深度学习的UV纹理空间人脸语义操控方法 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页