cs.CV（2025-02-14）

📊 共 20 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (9 🔗6) 支柱二：RL算法与架构 (RL & Architecture) (3 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding	提出Insect-LLaVA，用于视觉昆虫理解的多模态基础模型与数据集	foundation model multimodal
2	Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence	Granite Vision：轻量级开源多模态模型，专为企业智能设计	large language model multimodal instruction following	✅
3	V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models	提出V2V-LLM以解决车辆间协作自动驾驶中的感知与规划问题	large language model	✅
4	PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation	PolyPath：利用大型多模态模型进行多切片病理报告生成	multimodal
5	TSP3D: Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding	提出文本引导的稀疏体素剪枝TSP3D，用于高效的3D视觉定位	visual grounding	✅
6	Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling	提出注意力引导的概念模型(AGCM)，用于可解释的多模态人类行为建模。	multimodal
7	KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models	提出KKA：利用大语言模型的异常相关知识提升视觉异常检测性能	large language model	✅
8	A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations	综述性研究：全面分析大视觉语言模型（LVLM）的安全性，涵盖攻击、防御与评估。	multimodal	✅
9	TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types	TaskGalaxy：通过数万种视觉任务类型扩展多模态指令微调	multimodal	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
10	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	Step-Video-T2V：提出300亿参数的文本到视频预训练模型，生成高质量长视频	flow matching DPO foundation model	✅
11	HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation	HIPPo：利用图像到3D先验实现无模型零样本6D位姿估计	dreamer 6D pose estimation foundation model
12	Self-Consistent Model-based Adaptation for Visual Reinforcement Learning	提出自洽模型自适应（SCMA）方法，提升视觉强化学习在干扰环境下的鲁棒性。	reinforcement learning world model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
13	ReStyle3D: Scene-Level Appearance Transfer with Semantic Correspondences	ReStyle3D：基于语义对应关系的场景级外观迁移框架	monocular depth open-vocabulary open vocabulary
14	RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control	RealCam-I2V：基于单目深度估计和交互式相机控制的真实场景图像到视频生成。	depth estimation metric depth scene reconstruction	✅
15	Multi-view 3D surface reconstruction from SAR images by inverse rendering	提出基于逆渲染的SAR图像三维重建方法，无需严格的干涉测量约束。	neural radiance field

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
16	ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation	ManiTrend：利用3D流弥合未来生成与动作预测，用于机器人操作	manipulation cross-embodiment spatiotemporal
17	A Lightweight and Effective Image Tampering Localization Network with Vision Mamba	提出基于Vision Mamba的轻量级图像篡改定位网络ForMa，实现高效全局依赖建模。	manipulation Mamba state space model	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Classifier-free Guidance with Adaptive Scaling	提出β-CFG自适应调整扩散模型引导强度，平衡图像质量与文本一致性	classifier-free guidance

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Quantifying the Impact of Motion on 2D Gaze Estimation in Real-World Mobile Interactions	量化移动交互中运动对2D注视估计的影响，揭示动态场景下的精度下降	spatial relationship

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Conditional Latent Coding with Learnable Synthesized Reference for Deep Image Compression	提出基于可学习合成参考的条件潜在编码，用于深度图像压缩。	feature matching	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页