cs.CV(2025-02-14)

📊 共 20 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗6) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (3 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding 提出Insect-LLaVA,用于视觉昆虫理解的多模态基础模型与数据集 foundation model multimodal
2 Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence Granite Vision:轻量级开源多模态模型,专为企业智能设计 large language model multimodal instruction following
3 V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models 提出V2V-LLM以解决车辆间协作自动驾驶中的感知与规划问题 large language model
4 PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation PolyPath:利用大型多模态模型进行多切片病理报告生成 multimodal
5 TSP3D: Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding 提出文本引导的稀疏体素剪枝TSP3D,用于高效的3D视觉定位 visual grounding
6 Interpretable Concept-based Deep Learning Framework for Multimodal Human Behavior Modeling 提出注意力引导的概念模型(AGCM),用于可解释的多模态人类行为建模。 multimodal
7 KKA: Improving Vision Anomaly Detection through Anomaly-related Knowledge from Large Language Models 提出KKA:利用大语言模型的异常相关知识提升视觉异常检测性能 large language model
8 A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations 综述性研究:全面分析大视觉语言模型(LVLM)的安全性,涵盖攻击、防御与评估。 multimodal
9 TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types TaskGalaxy:通过数万种视觉任务类型扩展多模态指令微调 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
10 Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Step-Video-T2V:提出300亿参数的文本到视频预训练模型,生成高质量长视频 flow matching DPO foundation model
11 HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation HIPPo:利用图像到3D先验实现无模型零样本6D位姿估计 dreamer 6D pose estimation foundation model
12 Self-Consistent Model-based Adaptation for Visual Reinforcement Learning 提出自洽模型自适应(SCMA)方法,提升视觉强化学习在干扰环境下的鲁棒性。 reinforcement learning world model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
13 ReStyle3D: Scene-Level Appearance Transfer with Semantic Correspondences ReStyle3D:基于语义对应关系的场景级外观迁移框架 monocular depth open-vocabulary open vocabulary
14 RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control RealCam-I2V:基于单目深度估计和交互式相机控制的真实场景图像到视频生成。 depth estimation metric depth scene reconstruction
15 Multi-view 3D surface reconstruction from SAR images by inverse rendering 提出基于逆渲染的SAR图像三维重建方法,无需严格的干涉测量约束。 neural radiance field

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
16 ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation ManiTrend:利用3D流弥合未来生成与动作预测,用于机器人操作 manipulation cross-embodiment spatiotemporal
17 A Lightweight and Effective Image Tampering Localization Network with Vision Mamba 提出基于Vision Mamba的轻量级图像篡改定位网络ForMa,实现高效全局依赖建模。 manipulation Mamba state space model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
18 Classifier-free Guidance with Adaptive Scaling 提出β-CFG自适应调整扩散模型引导强度,平衡图像质量与文本一致性 classifier-free guidance

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
19 Quantifying the Impact of Motion on 2D Gaze Estimation in Real-World Mobile Interactions 量化移动交互中运动对2D注视估计的影响,揭示动态场景下的精度下降 spatial relationship

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
20 Conditional Latent Coding with Learnable Synthesized Reference for Deep Image Compression 提出基于可学习合成参考的条件潜在编码,用于深度图像压缩。 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页