cs.CV(2025-01-05)

📊 共 9 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
1 Vision-Driven Prompt Optimization for Large Language Models in Multimodal Generative Tasks 提出视觉驱动的提示优化VDPO,提升多模态生成任务中大语言模型的图像生成质量。 large language model multimodal
2 Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation Face-MakeUp:利用多模态面部提示提升文本到图像生成的人脸质量 multimodal
3 FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance FOLDER:通过增强性能加速多模态大语言模型 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
4 Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera 提出Depth Any Camera (DAC),实现任意相机零样本度量深度估计 depth estimation metric depth foundation model
5 DepthMaster: Taming Diffusion Models for Monocular Depth Estimation DepthMaster:利用扩散模型提升单目深度估计的泛化性和细节保持能力 depth estimation monocular depth
6 AHMSA-Net: Adaptive Hierarchical Multi-Scale Attention Network for Micro-Expression Recognition 提出AHMSA-Net,通过自适应分层多尺度注意力网络提升微表情识别精度。 optical flow

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
7 GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking GS-DiT:通过高效稠密3D点追踪和伪4D高斯场推进视频生成 manipulation gaussian splatting splatting
8 Boosting Edge Detection with Pixel-wise Feature Selection: The Extractor-Selector Paradigm 提出Extractor-Selector范式,通过像素级特征选择提升边缘检测精度。 biped

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
9 Evolving Skeletons: Motion Dynamics in Action Recognition 提出运动增强骨架序列以提升动作识别效果 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页