cs.CV(2025-01-25)
📊 共 13 篇论文 | 🔗 1 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (3)
支柱四:生成式动作 (Generative Motion) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Towards Better Robustness: Pose-Free 3D Gaussian Splatting for Arbitrarily Long Videos | 提出Rob-GS框架以解决长视频中的相机姿态估计问题 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 2 | HuGDiffusion: Generalizable Single-Image Human Rendering via 3D Gaussian Diffusion | HuGDiffusion:基于3D高斯扩散的通用单图人体渲染 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 3 | Vision without Images: End-to-End Computer Vision from Single Compressive Measurements | 提出基于压缩感知的CompDAE,直接从单次压缩测量中实现端到端计算机视觉,尤其适用于弱光环境。 | depth estimation | ||
| 4 | Leveraging Motion Estimation for Efficient Bayer-Domain Computer Vision | 提出基于运动估计的Bayer域视频卷积,加速视频视觉任务。 | depth estimation |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models | 提出Finedefics,通过属性描述增强多模态大语言模型在细粒度视觉识别上的能力 | contrastive learning large language model | ✅ | |
| 6 | MambaTron: Efficient Cross-Modal Point Cloud Enhancement using Aggregate Selective State Space Modeling | 提出MambaTron,利用聚合选择性状态空间建模实现高效跨模态点云增强。 | Mamba state space model | ||
| 7 | Efficient Point Clouds Upsampling via Flow Matching | 提出PUFM,通过Flow Matching高效实现点云上采样 | flow matching | ||
| 8 | PolaFormer: Polarity-aware Linear Attention for Vision Transformers | PolaFormer:极性感知线性注意力机制,提升视觉Transformer性能。 | linear attention |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures | 提出PatentLMM,用于生成专利图中技术图纸的详细描述。 | large language model multimodal | ||
| 10 | HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding | 提出HumanOmni,首个面向人类中心场景的视觉-语音语言大模型 | large language model multimodal | ||
| 11 | Complementary Subspace Low-Rank Adaptation of Vision-Language Models for Few-Shot Classification | 提出互补子空间低秩自适应Comp-LoRA,解决VLM少样本分类中的灾难性遗忘问题 | foundation model |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | KETA: Kinematic-Phrases-Enhanced Text-to-Motion Generation via Fine-grained Alignment | KETA:通过细粒度对齐增强运动学短语的文本到动作生成 | motion diffusion model motion diffusion text-to-motion |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | SpatioTemporal Learning for Human Pose Estimation in Sparsely-Labeled Videos | STDPose:面向稀疏标注视频,通过时空动态学习提升人体姿态估计 | spatiotemporal |