cs.CV(2025-04-23)
📊 共 27 篇论文 | 🔗 9 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (9 🔗5)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (5 🔗1)
支柱八:物理动画 (Physics-based Animation) (4 🔗1)
支柱一:机器人控制 (Robot Control) (1 🔗1)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | TraveLLaMA: A Multimodal Travel Assistant with Large-Scale Dataset and Structured Reasoning | TraveLLaMA:基于大规模数据集和结构化推理的多模态旅行助手 | scene understanding multimodal chain-of-thought | ||
| 11 | Gaussian Splatting is an Effective Data Generator for 3D Object Detection | 利用高斯溅射进行数据增强,提升自动驾驶3D目标检测性能 | gaussian splatting splatting | ||
| 12 | Visibility-Uncertainty-guided 3D Gaussian Inpainting via Scene Conceptional Learning | 提出VISTA框架,通过可见性不确定性引导和场景概念学习实现高质量3D高斯补全。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 13 | ToF-Splatting: Dense SLAM using Sparse Time-of-Flight Depth and Multi-Frame Integration | ToF-Splatting:利用稀疏ToF深度和多帧融合的稠密SLAM | 3D gaussian splatting gaussian splatting splatting | ||
| 14 | Dual-Camera All-in-Focus Neural Radiance Fields | 提出DC-NeRF,利用双摄像头合成全聚焦神经辐射场,无需手动重聚焦。 | NeRF neural radiance field | ||
| 15 | SaENeRF: Suppressing Artifacts in Event-based Neural Radiance Fields | SaENeRF:提出自监督框架,抑制基于事件的神经辐射场重建伪影 | NeRF neural radiance field | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs | DyMU:动态合并与虚拟解合并,提升视觉语言模型效率且无需训练。 | large language model | ✅ | |
| 17 | Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward | 揭示LVLM对基本视觉变化的鲁棒性不足,并提出改进方向 | multimodal | ||
| 18 | VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension | 提出VideoVista-CulturalLingo,弥合文化、语言和领域差异的视频理解评估基准。 | multimodal | ||
| 19 | Facial Foundational Model Advances Early Warning of Coronary Artery Disease from Live Videos with DigitalShadow | DigitalShadow:利用面部基础模型从实时视频中早期预警冠状动脉疾病 | foundation model | ||
| 20 | Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation | 提出多层融合推理架构MFRA,提升视觉语言导航任务的决策精度。 | VLN |
🔬 支柱八:物理动画 (Physics-based Animation) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | A multi-scale vision transformer-based multimodal GeoAI model for mapping Arctic permafrost thaw | 提出基于多尺度Vision Transformer的多模态GeoAI模型,用于精确绘制北极多年冻土融化滑坡 | spatiotemporal multimodal | ||
| 22 | 4D Multimodal Co-attention Fusion Network with Latent Contrastive Alignment for Alzheimer's Diagnosis | 提出M2M-AlignNet,通过多模态对齐和协同注意力融合sMRI和fMRI,用于阿尔茨海默病诊断。 | spatiotemporal multimodal | ||
| 23 | Direct Video-Based Spatiotemporal Deep Learning for Cattle Lameness Detection | 提出基于视频时空深度学习的牛跛足检测方法,无需姿态估计预处理。 | spatiotemporal | ||
| 24 | BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation | 提出BadVideo,针对文本生成视频模型的隐蔽后门攻击框架 | spatiotemporal | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | PPS-Ctrl: Controllable Sim-to-Real Translation for Colonoscopy Depth Estimation | PPS-Ctrl:用于结肠镜深度估计的可控Sim-to-Real图像转换 | sim-to-real depth estimation | ✅ |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 26 | A Time Series Dataset of NIR Spectra and RGB and NIR-HSI Images of the Barley Germination Process | 发布大麦发芽过程的近红外光谱、RGB和近红外高光谱图像时间序列数据集 | HSI |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | PRaDA: Projective Radial Distortion Averaging | 提出PRaDA,在射影空间中实现鲁棒的径向畸变相机自动标定 | feature matching |