cs.CV(2025-05-03)

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (4) 支柱二:RL算法与架构 (RL & Architecture) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
1 AquaGS: Fast Underwater Scene Reconstruction with SfM-Free Gaussian Splatting 提出AquaGS以解决水下场景重建速度慢与精度低的问题 3D gaussian splatting 3DGS gaussian splatting
2 GenSync: A Generalized Talking Head Framework for Audio-driven Multi-Subject Lip-Sync using 3D Gaussian Splatting 提出GenSync框架以解决多身份口型同步问题 3D gaussian splatting gaussian splatting splatting
3 HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder 提出HybridGS以解决3D高斯点云压缩效率低的问题 3D gaussian splatting 3DGS gaussian splatting
4 RESAnything: Attribute Prompting for Arbitrary Referring Segmentation 提出RESAnything以解决任意指称分割问题 open-vocabulary open vocabulary large language model

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
5 Knowledge-Augmented Language Models Interpreting Structured Chest X-Ray Findings 提出CXR-TextInter以解决胸部X光图像解读问题 large language model foundation model multimodal
6 Automated ARAT Scoring Using Multimodal Video Analysis, Multi-View Fusion, and Hierarchical Bayesian Models: A Clinician Study 提出自动化ARAT评分系统以解决中风康复评估的时间和准确性问题 multimodal
7 Mitigating Group-Level Fairness Disparities in Federated Visual Language Models 提出FVL-FP框架以解决联邦视觉语言模型中的群体公平性问题 multimodal
8 Enhancing the Learning Experience: Using Vision-Language Models to Generate Questions for Educational Videos 利用视觉-语言模型生成教育视频问题以提升学习体验 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
9 Multimodal Graph Representation Learning for Robust Surgical Workflow Recognition with Adversarial Feature Disentanglement 提出多模态图表示学习以解决手术工作流程识别的鲁棒性问题 representation learning multimodal
10 PosePilot: Steering Camera Pose for Generative World Models with Self-supervised Depth 提出PosePilot以解决摄像头姿态控制问题 world model depth estimation geometric consistency

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
11 Vision and Intention Boost Large Language Model in Long-Term Action Anticipation 提出意图条件视觉语言模型以解决长时间动作预测问题 Ego4D large language model
12 MVHumanNet++: A Large-scale Dataset of Multi-view Daily Dressing Human Captures with Richer Annotations for 3D Human Digitization 提出MVHumanNet++以解决3D人类数字化数据不足问题 SMPL SMPL-X large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
13 3DWG: 3D Weakly Supervised Visual Grounding via Category and Instance-Level Alignment 提出3DWG以解决3D弱监督视觉定位中的类别与实例复杂性问题 spatial relationship visual grounding

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
14 VideoLLM Benchmarks and Evaluation: A Survey 评估视频大语言模型的基准与方法论综述 spatiotemporal large language model multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
15 Efficient 3D Full-Body Motion Generation from Sparse Tracking Inputs with Temporal Windows 提出基于MLP的高效3D全身动作生成方法以解决稀疏追踪输入问题 motion generation

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models 提出视觉语言模型的视觉视角理解评估方法 humanoid scene understanding

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
17 Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion 提出Co$^{3}$Gesture以解决双人互动语音手势生成问题 mutual attention

⬅️ 返回 cs.CV 首页 · 🏠 返回主页