cs.CV(2025-02-21)

📊 共 18 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (11 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱四:生成式动作 (Generative Motion) (1) 支柱三:空间感知与语义 (Perception & Semantics) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
1 Forgotten Polygons: Multimodal Large Language Models are Shape-Blind 揭示多模态大语言模型在几何形状识别上的缺陷,并提出视觉提示链式思考方法。 large language model multimodal chain-of-thought
2 Multi-Agent Multimodal Models for Multicultural Text to Image Generation 提出MosAIG多智能体框架,增强多文化文本到图像生成效果。 large language model multimodal
3 M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment M3-AGIQA:多模态多轮多角度评估AI生成图像质量 large language model multimodal
4 ELIP: Enhanced Visual-Language Foundation Models for Image Retrieval ELIP:增强视觉-语言基础模型,提升图像检索性能 foundation model
5 LongCaptioning: Unlocking the Power of Long Video Caption Generation in Large Multimodal Models 提出LongCaption-Agent框架,解决大模型长视频描述生成中长文本标注稀缺问题 multimodal
6 M2LADS Demo: A System for Generating Multimodal Learning Analytics Dashboards M2LADS:用于生成多模态学习分析仪表板的系统 multimodal
7 Fish feeding behavior recognition and intensity quantification methods in aquaculture: From single modality analysis to multimodality fusion 综述水产养殖中鱼类摄食行为识别与强度量化方法,从单模态分析到多模态融合 multimodal
8 Memory Helps, but Confabulation Misleads: Understanding Streaming Events in Videos with MLLMs 提出一种基于记忆修正的多模态大语言模型,提升其在流式视频事件理解中的性能。 large language model multimodal
9 MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing 提出MOVE:一种混合视觉编码器方法,用于领域聚焦的视觉-语言处理 large language model multimodal
10 WorldCraft: Photo-Realistic 3D World Creation and Customization via LLM Agents WorldCraft:利用LLM Agent实现照片级真实3D世界创建与定制 large language model
11 AutoMR: A Universal Time Series Motion Recognition Pipeline AutoMR:通用时序运动识别流水线,解决多模态数据处理难题。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
12 TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba TransMamba:利用Transformer预训练知识快速适配Mamba架构,实现通用架构迁移 Mamba distillation foundation model
13 Hierarchical Context Transformer for Multi-level Semantic Scene Understanding 提出层级上下文Transformer(HCT)用于多层次语义场景理解,提升手术场景分析能力。 representation learning contrastive learning scene understanding
14 A Novel Riemannian Sparse Representation Learning Network for Polarimetric SAR Image Classification 提出一种黎曼稀疏表示学习网络,用于极化SAR图像分类,提升边缘细节和区域同质性。 representation learning
15 SiMHand: Mining Similar Hands for Large-Scale 3D Hand Pose Pre-training SiMHand:挖掘相似手部图像,用于大规模3D手部姿态预训练 contrastive learning Ego4D

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
16 Human Motion Prediction, Reconstruction, and Generation 综述人体运动预测、重建与生成技术,探索其在机器人、游戏和虚拟现实中的应用。 text-to-motion motion synthesis motion generation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (1 篇)

#题目一句话要点标签🔗
17 RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes 提出OpenGS-SLAM,解决RGB-Only条件下室外场景的Gaussian Splatting SLAM问题。 depth estimation 3D gaussian splatting 3DGS

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
18 An ocean front detection and tracking algorithm 提出BFDT-MSA框架,用于解决海洋锋检测与追踪中不连续、过度检测等问题。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页