cs.CV（2025-01-05）

📊 共 9 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (3 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Vision-Driven Prompt Optimization for Large Language Models in Multimodal Generative Tasks	提出视觉驱动的提示优化VDPO，提升多模态生成任务中大语言模型的图像生成质量。	large language model multimodal
2	Face-MakeUp: Multimodal Facial Prompts for Text-to-Image Generation	Face-MakeUp：利用多模态面部提示提升文本到图像生成的人脸质量	multimodal	✅
3	FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance	FOLDER：通过增强性能加速多模态大语言模型	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
4	Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera	提出Depth Any Camera (DAC)，实现任意相机零样本度量深度估计	depth estimation metric depth foundation model
5	DepthMaster: Taming Diffusion Models for Monocular Depth Estimation	DepthMaster：利用扩散模型提升单目深度估计的泛化性和细节保持能力	depth estimation monocular depth	✅
6	AHMSA-Net: Adaptive Hierarchical Multi-Scale Attention Network for Micro-Expression Recognition	提出AHMSA-Net，通过自适应分层多尺度注意力网络提升微表情识别精度。	optical flow

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
7	GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking	GS-DiT：通过高效稠密3D点追踪和伪4D高斯场推进视频生成	manipulation gaussian splatting splatting	✅
8	Boosting Edge Detection with Pixel-wise Feature Selection: The Extractor-Selector Paradigm	提出Extractor-Selector范式，通过像素级特征选择提升边缘检测精度。	biped

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Evolving Skeletons: Motion Dynamics in Action Recognition	提出运动增强骨架序列以提升动作识别效果	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页