cs.CV（2023-12-08）

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (5 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (4) 支柱九：具身大模型 (Embodied Foundation Models) (3) 支柱四：生成式动作 (Generative Motion) (2 🔗1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Learn to Optimize Denoising Scores for 3D Generation: A Unified and Improved Diffusion Prior on NeRF and 3D Gaussian Splatting	提出统一框架，优化3D生成扩散先验，显著提升NeRF和3D高斯溅射效果	distillation 3D gaussian splatting gaussian splatting
2	Prospective Role of Foundation Models in Advancing Autonomous Vehicles	探讨Foundation Model在提升自动驾驶场景理解与安全性的潜在作用	world model dreamer scene understanding
3	SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation	SwiftBrush：一种基于变分分数蒸馏的单步文本到图像扩散模型	distillation neural radiance field
4	RL Dreams: Policy Gradient Optimization for Score Distillation based 3D Generation	提出DDPO3D，利用策略梯度优化Score Distillation方法，提升文本到3D生成质量	diffusion policy distillation
5	Disentangled Clothed Avatar Generation from Text Descriptions	提出SO-SMPL解耦服装与人体，实现高质量可动画的文本驱动 clothed avatar 生成。	distillation SMPL character animation	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
6	Fine Dense Alignment of Image Bursts through Camera Pose and Depth Estimation	提出一种通过相机位姿和深度估计实现图像序列精细稠密对齐的方法	depth estimation optical flow
7	Multi-view Inversion for 3D-aware Generative Adversarial Networks	提出多视角3D GAN反演方法，提升人头重建的几何精度和图像质量	NeRF scene reconstruction
8	TriHuman : A Real-time and Controllable Tri-plane Representation for Detailed Human Geometry and Appearance Synthesis	提出TriHuman，一种实时可控的三平面人体几何与外观合成方法	NeRF neural radiance field
9	360° Volumetric Portrait Avatar	提出3VP Avatar，仅用单目视频重建360°逼真的人像Avatar	neural radiance field

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
10	GlitchBench: Can large multimodal models detect video game glitches?	提出GlitchBench基准测试，用于评估大型多模态模型检测视频游戏故障的能力	large language model multimodal
11	Visual Grounding of Whole Radiology Reports for 3D CT Images	提出一种用于3D CT图像放射报告视觉定位框架，提升异常检测准确率。	visual grounding
12	User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning	提出用户感知Prefix-Tuning框架，用于个性化图像描述生成。	large language model

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
13	3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection	提出基于物理规则的3D复制粘贴方法，提升单目3D目标检测性能。	physically plausible	✅
14	HandDiffuse: Generative Controllers for Two-Hand Interactions via Diffusion Models	HandDiffuse：利用扩散模型生成可控的双手交互运动	motion generation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Quantitative perfusion maps using a novelty spatiotemporal convolutional neural network	提出一种新颖的时空卷积神经网络ST-Net，用于准确估计DSC-MRI的灌注参数。	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Reality's Canvas, Language's Brush: Crafting 3D Avatars from Monocular Video	ReCaLaB：单目视频驱动的高保真可控3D人体Avatar生成	manipulation NeRF

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Reconstructing Hands in 3D with Transformers	HaMeR：基于Transformer的单目图像3D手部重建方法	hand reconstruction	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页