cs.CV(2025-01-05)
📊 共 9 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Vision-Driven Prompt Optimization for Large Language Models in Multimodal Generative Tasks | 提出视觉驱动的提示优化VDPO,提升多模态生成任务中大语言模型的图像生成质量。 | large language model multimodal | ||
| 2 | Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation | Face-MakeUp:利用多模态面部提示提升文本到图像生成的人脸质量 | multimodal | ✅ | |
| 3 | FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance | FOLDER:通过增强性能加速多模态大语言模型 | large language model |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 4 | Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera | 提出Depth Any Camera (DAC),实现任意相机零样本度量深度估计 | depth estimation metric depth foundation model | ||
| 5 | DepthMaster: Taming Diffusion Models for Monocular Depth Estimation | DepthMaster:利用扩散模型提升单目深度估计的泛化性和细节保持能力 | depth estimation monocular depth | ✅ | |
| 6 | AHMSA-Net: Adaptive Hierarchical Multi-Scale Attention Network for Micro-Expression Recognition | 提出AHMSA-Net,通过自适应分层多尺度注意力网络提升微表情识别精度。 | optical flow |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking | GS-DiT:通过高效稠密3D点追踪和伪4D高斯场推进视频生成 | manipulation gaussian splatting splatting | ✅ | |
| 8 | Boosting Edge Detection with Pixel-wise Feature Selection: The Extractor-Selector Paradigm | 提出Extractor-Selector范式,通过像素级特征选择提升边缘检测精度。 | biped |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Evolving Skeletons: Motion Dynamics in Action Recognition | 提出运动增强骨架序列以提升动作识别效果 | spatiotemporal |