cs.CV(2025-01-30)

📊 共 18 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗3) 支柱一:机器人控制 (Robot Control) (3) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱二:RL算法与架构 (RL & Architecture) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models 综述多模态自适应与泛化研究,涵盖传统方法到多模态预训练大模型 foundation model multimodal
2 Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models 提出MIS数据集,提升视觉语言模型在安全场景下的视觉推理能力 multimodal instruction following chain-of-thought
3 High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2 利用参数高效的LoRA微调多模态LLaMA 3.2模型,实现高精度ECG图像判读 multimodal
4 Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations 提出测试时提示引导训练方法,提升视觉基础模型在VFSS分割任务上的性能 foundation model
5 AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment 提出AGAV-Rater,利用大型多模态模型评估AI生成音视频质量,提升用户体验。 multimodal
6 Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment 提出MUTUD框架,实现多模态训练和单模态部署的高效语音处理 multimodal
7 Every Image Listens, Every Image Dances: Music-Driven Image Animation MuseDance:提出一种音乐驱动的图像动画生成模型,无需复杂运动引导。 multimodal
8 Multispectral 3D mapping on a Roman sculpture to study ancient polychromy 提出一种基于多光谱3D建模的罗马雕塑色彩分析方法,用于研究古代雕塑的多彩性。 multimodal
9 Human Re-ID Meets LVLMs: What can we expect? 评估大型视觉语言模型在行人重识别任务中的性能与局限性 multimodal
10 Foundational Models for 3D Point Clouds: A Survey and Outlook 综述3D点云基础模型,填补领域内全面深入文献回顾的空白。 foundation model
11 CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction 提出CLEAR框架,利用进化算法优化提示词,提升LLM在可持续性数据提取中的图像识别精度。 large language model
12 A Video-grounded Dialogue Dataset and Metric for Event-driven Activities 提出VDAct视频对话数据集与VDEval评测指标,用于事件驱动活动理解。 foundation model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
13 Free-T2M: Robust Text-to-Motion Generation for Humanoid Robots via Frequency-Domain Free-T2M:通过频域增强,实现人形机器人鲁棒的文本到动作生成 humanoid humanoid robot text-to-motion
14 Strong and Controllable 3D Motion Generation 提出Motion ControlNet,加速并精确控制3D人体动作生成,适用于实时交互场景。 manipulation linear attention text-to-motion
15 Motion Diffusion Autoencoders: Enabling Attribute Manipulation in Human Motion Demonstrated on Karate Techniques 提出基于运动扩散自编码器的属性操控方法,应用于空手道动作 manipulation motion diffusion

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
16 REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning 提出REMOTE,通过多模态视觉特征学习实现内窥镜实时位姿跟踪。 optical flow motion tracking multimodal
17 Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion 提出基于扩散模型的多视角几何扩散(MVGD),用于零样本新视角图像和深度合成。 depth estimation scene reconstruction

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
18 HSRMamba: Contextual Spatial-Spectral State Space Model for Single Image Hyperspectral Super-Resolution 提出HSRMamba,利用上下文空间-光谱状态空间模型进行单图像高光谱超分辨率重建 Mamba state space model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页