cs.CV(2025-04-28)

📊 共 26 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗3) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱四:生成式动作 (Generative Motion) (2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱一:机器人控制 (Robot Control) (1 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 Enhancing Surgical Documentation through Multimodal Visual-Temporal Transformers and Generative AI 提出基于多模态视觉-时序Transformer和生成式AI的手术文档自动生成方法 large language model multimodal
2 Can a Large Language Model Assess Urban Design Quality? Evaluating Walkability Metrics Across Expertise Levels 利用大语言模型评估城市设计质量:基于不同专业知识水平的可步行性指标分析 large language model multimodal
3 DEEMO: De-identity Multimodal Emotion Recognition and Reasoning 提出DEEMO框架,解决去身份信息的多模态情感识别与推理问题 large language model multimodal
4 DeepAndes: A Self-Supervised Vision Foundation Model for Multi-Spectral Remote Sensing Imagery of the Andes DeepAndes:面向安第斯山脉多光谱遥感影像的自监督视觉基础模型 foundation model
5 A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals 提出TransFusion模型,融合视觉和无线信号,高效解决人群计数问题。 multimodal
6 SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning 提出SpatialReasoner以解决3D空间推理问题 large language model foundation model chain-of-thought
7 SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation SRMF:针对长尾UHR卫星图像分割的数据增强与多模态融合方法 multimodal
8 LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning 提出LR-IAD,通过逻辑推理实现无掩码工业异常检测,显著提升性能。 large language model multimodal chain-of-thought
9 FSBench: A Figure Skating Benchmark for Advancing Artistic Sports Understanding 提出FSBench花样滑冰基准数据集,促进艺术体育理解 multimodal
10 Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video Prisma:用于视觉和视频领域可解释性的开源工具包 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
11 CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback CoherenDream:利用多模态大语言模型反馈提升3D生成中的文本一致性 distillation large language model multimodal
12 Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding 提出MPEC,用于开放词汇3D场景理解,提升语义分割和零样本能力。 contrastive learning scene understanding open-vocabulary
13 DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer DiVE:基于视频扩散Transformer的高效多视角驾驶场景生成 distillation classifier-free guidance spatiotemporal
14 Mesh-Learner: Texturing Mesh with Spherical Harmonics Mesh-Learner:利用球谐函数纹理实现可微分的网格渲染与重建 reinforcement learning 3D gaussian splatting gaussian splatting
15 Taming the Randomness: Towards Label-Preserving Cropping in Contrastive Learning 提出标签保持裁剪方法,提升对比学习在图像分类中的鲁棒性 contrastive learning

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
16 Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video 提出CoPE-NeRF,通过联合优化神经辐射场和连续相机运动,实现单目视频的三维重建。 depth estimation NeRF neural radiance field
17 STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction 提出STCOcc,利用稀疏时空级联更新进行3D occupancy和场景流预测 scene flow
18 CE-NPBG: Connectivity Enhanced Neural Point-Based Graphics for Novel View Synthesis in Autonomous Driving Scenes CE-NPBG:面向自动驾驶场景,提出连接增强的神经点云图新视角合成方法 3D gaussian splatting gaussian splatting splatting
19 Category-Level and Open-Set Object Pose Estimation for Robotics 针对机器人,研究类别级和开放集物体姿态估计方法 scene understanding 6D pose estimation
20 MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion MP-SfM:利用单目表面先验实现鲁棒的Structure-from-Motion monocular depth

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
21 Foundation Model-Driven Framework for Human-Object Interaction Prediction with Segmentation Mask Integration 提出Seg2HOI框架,集成分割模型增强人-物交互预测,实现零样本泛化。 human-object interaction HOI foundation model
22 HOIGaze: Gaze Estimation During Hand-Object Interactions in Extended Reality Exploiting Eye-Hand-Head Coordination HOIGaze:利用眼-手-头协同,提升扩展现实中手-物交互的注视点估计精度 HOI

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
23 Physics-Informed Diffusion Models for SAR Ship Wake Generation from Text Prompts 提出基于物理信息的扩散模型,用于从文本提示生成SAR船舶尾迹 physics-informed diffusion
24 CasaGPT: Cuboid Arrangement and Scene Assembly for Interior Design CasaGPT:提出基于长方体排列的室内场景合成方法,提升场景真实感。 physically plausible

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
25 Learning Streaming Video Representation via Multitask Training 提出StreamFormer,通过多任务训练学习高效的流式视频表示,适用于实时应用。 spatial relationship embodied AI

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
26 ShowMak3r: Compositional TV Show Reconstruction ShowMak3r:提出一种可组合的电视剧场景重建方法,用于编辑和操控演员及场景。 manipulation TAMP

⬅️ 返回 cs.CV 首页 · 🏠 返回主页