cs.CV(2025-05-03)
📊 共 17 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (4)
支柱二:RL算法与架构 (RL & Architecture) (2)
支柱六:视频提取与匹配 (Video Extraction) (2 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱八:物理动画 (Physics-based Animation) (1)
支柱四:生成式动作 (Generative Motion) (1)
支柱一:机器人控制 (Robot Control) (1)
支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting | AquaGS:一种无需SfM的水下快速高斯溅射场景重建方法 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 2 | GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting | GenSync:一种基于3D高斯溅射的通用说话人头部框架,用于音频驱动的多主体唇形同步 | 3D gaussian splatting gaussian splatting splatting | ||
| 3 | HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder | HybridGS:利用双通道稀疏表示和点云编码器实现高效3D高斯溅射数据压缩 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 4 | RESAnything: Attribute Prompting for Arbitrary Referring Segmentation | RESAnything:通过属性提示实现任意指代表达式分割的零样本学习 | open-vocabulary open vocabulary large language model |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings | CXR-TextInter:利用知识增强的语言模型解释结构化胸部X光片 | large language model foundation model multimodal | ||
| 6 | Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study | 提出基于多模态视频分析和分层贝叶斯模型的自动ARAT评分系统,提升卒中康复评估效率。 | multimodal | ||
| 7 | Mitigating Group-Level Fairness Disparities in Federated Visual Language Models | 提出FVL-FP框架,解决联邦视觉语言模型中群体公平性差异问题 | multimodal | ||
| 8 | Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos | 利用视觉-语言模型为教育视频生成问题,提升学习体验 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement | 提出基于对抗解耦的多模态图表示学习方法,用于提升手术工作流识别在数据损坏下的鲁棒性。 | representation learning multimodal | ||
| 10 | PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth | PosePilot:利用自监督深度信息,提升生成世界模型中相机姿态的可控性 | world model depth estimation geometric consistency |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Vision and Intention Boost Large Language Model in Long-Term Action Anticipation | 提出ICVL模型,利用视觉意图增强LLM在长期行为预测中的性能 | Ego4D large language model | ||
| 12 | MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization | 提出MVHumanNet++大规模多视角人体数据集,促进3D人体数字化研究 | SMPL SMPL-X large language model | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | 3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment | 提出3DWG模型,通过类别和实例级对齐实现3D弱监督视觉定位 | spatial relationship visual grounding |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | VideoLLM Benchmarks and Evaluation: A Survey | VideoLLM基准与评估综述:全面分析与未来方向 | spatiotemporal large language model multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Efficient 3D Full-Body Motion Generation from Sparse Tracking Inputs with Temporal Windows | 提出基于MLP的时序窗口方法,高效生成稀疏输入下的3D全身动作 | motion generation |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models | 提出视觉视角获取评测基准,揭示VLM在空间推理上的不足 | humanoid scene understanding |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion | 提出Co$^{3}$Gesture以解决双人互动语音手势生成问题 | mutual attention | ✅ |