cs.CV(2025-05-03)

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (4) 支柱二:RL算法与架构 (RL & Architecture) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
1 AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting AquaGS:一种无需SfM的水下快速高斯溅射场景重建方法 3D gaussian splatting 3DGS gaussian splatting
2 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting GenSync:一种基于3D高斯溅射的通用说话人头部框架,用于音频驱动的多主体唇形同步 3D gaussian splatting gaussian splatting splatting
3 HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder HybridGS:利用双通道稀疏表示和点云编码器实现高效3D高斯溅射数据压缩 3D gaussian splatting 3DGS gaussian splatting
4 RESAnything: Attribute Prompting for Arbitrary Referring Segmentation RESAnything:通过属性提示实现任意指代表达式分割的零样本学习 open-vocabulary open vocabulary large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
5 Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings CXR-TextInter:利用知识增强的语言模型解释结构化胸部X光片 large language model foundation model multimodal
6 Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study 提出基于多模态视频分析和分层贝叶斯模型的自动ARAT评分系统,提升卒中康复评估效率。 multimodal
7 Mitigating Group-Level Fairness Disparities in Federated Visual Language Models 提出FVL-FP框架,解决联邦视觉语言模型中群体公平性差异问题 multimodal
8 Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos 利用视觉-语言模型为教育视频生成问题,提升学习体验 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
9 Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement 提出基于对抗解耦的多模态图表示学习方法,用于提升手术工作流识别在数据损坏下的鲁棒性。 representation learning multimodal
10 PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth PosePilot:利用自监督深度信息,提升生成世界模型中相机姿态的可控性 world model depth estimation geometric consistency

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
11 Vision and Intention Boost Large Language Model in Long-Term Action Anticipation 提出ICVL模型,利用视觉意图增强LLM在长期行为预测中的性能 Ego4D large language model
12 MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization 提出MVHumanNet++大规模多视角人体数据集,促进3D人体数字化研究 SMPL SMPL-X large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
13 3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment 提出3DWG模型,通过类别和实例级对齐实现3D弱监督视觉定位 spatial relationship visual grounding

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
14 VideoLLM Benchmarks and Evaluation: A Survey VideoLLM基准与评估综述:全面分析与未来方向 spatiotemporal large language model multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
15 Efficient 3D Full-Body Motion Generation from Sparse Tracking Inputs with Temporal Windows 提出基于MLP的时序窗口方法,高效生成稀疏输入下的3D全身动作 motion generation

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models 提出视觉视角获取评测基准,揭示VLM在空间推理上的不足 humanoid scene understanding

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
17 Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion 提出Co$^{3}$Gesture以解决双人互动语音手势生成问题 mutual attention

⬅️ 返回 cs.CV 首页 · 🏠 返回主页