cs.CV（2024-06-28）

📊 共 18 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (6 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱七：动作重定向 (Motion Retargeting) (3 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (3 🔗2) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment	MM-Instruct：生成视觉指令数据，提升大型多模态模型指令遵循能力	large language model multimodal instruction following	✅
2	Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs	提出Web2Code数据集与评估框架，提升多模态LLM网页理解与代码生成能力	large language model multimodal	✅
3	Multimodal Prototyping for cancer survival prediction	提出基于多模态原型学习的癌症生存预测方法，显著降低计算量并提升可解释性。	multimodal
4	PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration	PathGen-1.6M：通过多智能体协作生成160万病理图像-文本对，提升病理VLM性能	large language model multimodal
5	EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model	提出EVF-SAM，通过早期视觉-语言融合提升文本提示SAM的分割性能	multimodal
6	InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows	InfiniBench：长视频多模态大模型评测基准，挑战电影和电视剧理解	multimodal	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
7	EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting	EgoGaussian：利用3D高斯溅射从第一视角视频中理解动态场景	3D gaussian splatting gaussian splatting splatting
8	SpotlessSplats: Ignoring Distractors in 3D Gaussian Splatting	SpotlessSplats：利用鲁棒优化和预训练特征，消除3D高斯溅射中的干扰物	3D gaussian splatting 3DGS gaussian splatting
9	Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey	深度学习单目图像/视频深度估计方法综述：架构、监督与演进	depth estimation monocular depth
10	ASSR-NeRF: Arbitrary-Scale Super-Resolution on Voxel Grid for High-Quality Radiance Fields Reconstruction	提出ASSR-NeRF，通过体素网格上的任意尺度超分辨率实现高质量辐射场重建	NeRF
11	LightStereo: Channel Boost Is All You Need for Efficient 2D Cost Aggregation	LightStereo：通过通道增强实现高效的2D代价聚合立体匹配	scene flow	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
12	FootBots: A Transformer-based Architecture for Motion Prediction in Soccer	FootBots：基于Transformer的足球运动预测架构，利用等变性提升预测精度	motion prediction
13	MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance	MimicMotion：基于置信度感知姿态引导的高质量人体运动视频生成	human motion	✅
14	Optimized 3D Point Labeling with Leaders Using the Beams Displacement Method	提出基于梁位移法的三维点要素优化标注方法，解决标签重叠和方向偏差问题。	spatial relationship

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train	提出结构感知世界模型，通过大规模自监督预训练提升超声探头引导精度	world model spatial relationship
16	CSAKD: Knowledge Distillation with Cross Self-Attention for Hyperspectral and Multispectral Image Fusion	提出基于交叉自注意力知识蒸馏的CSAKD模型，用于高光谱和多光谱图像融合。	distillation HSI	✅
17	PopAlign: Population-Level Alignment for Fair Text-to-Image Generation	提出PopAlign，解决文本到图像生成中群体层面偏见问题。	reinforcement learning RLHF DPO	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	SemUV: Deep Learning based semantic manipulation over UV texture map of virtual human heads	SemUV：提出一种基于深度学习的UV纹理空间人脸语义操控方法	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页