cs.CV（2025-01-30）

📊 共 18 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (12 🔗3) 支柱一：机器人控制 (Robot Control) (3) 支柱三：空间感知与语义 (Perception & Semantics) (2) 支柱二：RL算法与架构 (RL & Architecture) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Advances in Multimodal Adaptation and Generalization: From Traditional Approaches to Foundation Models	综述多模态自适应与泛化研究，涵盖传统方法到多模态预训练大模型	foundation model multimodal	✅
2	Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models	提出MIS数据集，提升视觉语言模型在安全场景下的视觉推理能力	multimodal instruction following chain-of-thought
3	High-Accuracy ECG Image Interpretation using Parameter-Efficient LoRA Fine-Tuning with Multimodal LLaMA 3.2	利用参数高效的LoRA微调多模态LLaMA 3.2模型，实现高精度ECG图像判读	multimodal
4	Tuning Vision Foundation Model via Test-Time Prompt-Guided Training for VFSS Segmentations	提出测试时提示引导训练方法，提升视觉基础模型在VFSS分割任务上的性能	foundation model
5	AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment	提出AGAV-Rater，利用大型多模态模型评估AI生成音视频质量，提升用户体验。	multimodal	✅
6	Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment	提出MUTUD框架，实现多模态训练和单模态部署的高效语音处理	multimodal
7	Every Image Listens, Every Image Dances: Music-Driven Image Animation	MuseDance：提出一种音乐驱动的图像动画生成模型，无需复杂运动引导。	multimodal
8	Multispectral 3D mapping on a Roman sculpture to study ancient polychromy	提出一种基于多光谱3D建模的罗马雕塑色彩分析方法，用于研究古代雕塑的多彩性。	multimodal
9	Human Re-ID Meets LVLMs: What can we expect?	评估大型视觉语言模型在行人重识别任务中的性能与局限性	multimodal
10	Foundational Models for 3D Point Clouds: A Survey and Outlook	综述3D点云基础模型，填补领域内全面深入文献回顾的空白。	foundation model	✅
11	CLEAR: Cue Learning using Evolution for Accurate Recognition Applied to Sustainability Data Extraction	提出CLEAR框架，利用进化算法优化提示词，提升LLM在可持续性数据提取中的图像识别精度。	large language model
12	A Video-grounded Dialogue Dataset and Metric for Event-driven Activities	提出VDAct视频对话数据集与VDEval评测指标，用于事件驱动活动理解。	foundation model

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Free-T2M: Robust Text-to-Motion Generation for Humanoid Robots via Frequency-Domain	Free-T2M：通过频域增强，实现人形机器人鲁棒的文本到动作生成	humanoid humanoid robot text-to-motion
14	Strong and Controllable 3D Motion Generation	提出Motion ControlNet，加速并精确控制3D人体动作生成，适用于实时交互场景。	manipulation linear attention text-to-motion
15	Motion Diffusion Autoencoders: Enabling Attribute Manipulation in Human Motion Demonstrated on Karate Techniques	提出基于运动扩散自编码器的属性操控方法，应用于空手道动作	manipulation motion diffusion

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
16	REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning	提出REMOTE，通过多模态视觉特征学习实现内窥镜实时位姿跟踪。	optical flow motion tracking multimodal
17	Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion	提出基于扩散模型的多视角几何扩散（MVGD），用于零样本新视角图像和深度合成。	depth estimation scene reconstruction

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	HSRMamba: Contextual Spatial-Spectral State Space Model for Single Image Hyperspectral Super-Resolution	提出HSRMamba，利用上下文空间-光谱状态空间模型进行单图像高光谱超分辨率重建	Mamba state space model	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页