cs.CV(2025-01-25)

📊 共 13 篇论文 | 🔗 1 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (4) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (3) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
1 Towards Better Robustness: Pose-Free 3D Gaussian Splatting for Arbitrarily Long Videos 提出Rob-GS框架以解决长视频中的相机姿态估计问题 3D gaussian splatting 3DGS gaussian splatting
2 HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion HuGDiffusion:基于3D高斯扩散的通用单图人体渲染 3D gaussian splatting 3DGS gaussian splatting
3 Vision without Images: End-to-End Computer Vision from Single Compressive Measurements 提出基于压缩感知的CompDAE,直接从单次压缩测量中实现端到端计算机视觉,尤其适用于弱光环境。 depth estimation
4 Leveraging Motion Estimation for Efficient Bayer-Domain Computer Vision 提出基于运动估计的Bayer域视频卷积,加速视频视觉任务。 depth estimation

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
5 Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models 提出Finedefics,通过属性描述增强多模态大语言模型在细粒度视觉识别上的能力 contrastive learning large language model
6 MambaTron: Efficient Cross-Modal Point Cloud Enhancement using Aggregate Selective State Space Modeling 提出MambaTron,利用聚合选择性状态空间建模实现高效跨模态点云增强。 Mamba state space model
7 Efficient Point Clouds Upsampling via Flow Matching 提出PUFM,通过Flow Matching高效实现点云上采样 flow matching
8 PolaFormer: Polarity-aware Linear Attention for Vision Transformers PolaFormer:极性感知线性注意力机制,提升视觉Transformer性能。 linear attention

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
9 PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures 提出PatentLMM,用于生成专利图中技术图纸的详细描述。 large language model multimodal
10 HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding 提出HumanOmni,首个面向人类中心场景的视觉-语音语言大模型 large language model multimodal
11 Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification 提出互补子空间低秩自适应Comp-LoRA,解决VLM少样本分类中的灾难性遗忘问题 foundation model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
12 KETA: Kinematic-Phrases-Enhanced Text-to-Motion Generation via Fine-grained Alignment KETA:通过细粒度对齐增强运动学短语的文本到动作生成 motion diffusion model motion diffusion text-to-motion

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
13 SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos STDPose:面向稀疏标注视频,通过时空动态学习提升人体姿态估计 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页