cs.CV(2025-04-25)
📊 共 21 篇论文 | 🔗 4 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (8)
支柱九:具身大模型 (Embodied Foundation Models) (6 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1)
支柱五:交互与反应 (Interaction & Reaction) (1)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 9 | Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization | 提出UV-CoT,通过偏好优化实现无监督视觉思维链推理,提升多模态大模型的视觉理解能力。 | large language model multimodal chain-of-thought | ✅ | |
| 10 | SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models | SORT3D:利用大语言模型进行零样本3D场景理解的空间对象中心推理工具箱 | large language model | ✅ | |
| 11 | A Multimodal Hybrid Late-Cascade Fusion Network for Enhanced 3D Object Detection | 提出一种混合级联融合网络,利用LiDAR和RGB信息提升3D目标检测性能。 | multimodal | ||
| 12 | Revisiting Data Auditing in Large Vision-Language Models | 揭示大视觉语言模型数据审计中成员推理的局限性,并探索可行场景 | generalist agent large language model visual grounding | ||
| 13 | SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology | 提出SSL4Eco全球季节性数据集,提升生态地学基础模型在下游任务的表现。 | foundation model | ✅ | |
| 14 | From Mapping to Composing: A Two-Stage Framework for Zero-shot Composed Image Retrieval | 提出两阶段框架以解决零-shot组合图像检索问题 | multimodal |
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | S3MOT: Monocular 3D Object Tracking with Selective State Space Model | S3MOT:基于选择性状态空间模型的单目3D目标跟踪 | state space model contrastive learning spatiotemporal | ✅ | |
| 16 | Co-Training with Active Contrastive Learning and Meta-Pseudo-Labeling on 2D Projections for Deep Semi-Supervised Learning | 提出active-DeepFA,结合主动对比学习与元伪标签,提升半监督图像分类在小样本生物图像上的性能。 | contrastive learning teacher-student | ||
| 17 | TSCL:Multi-party loss Balancing scheme for deep learning Image steganography based on Curriculum learning | 提出TSCL:一种基于课程学习的深度学习图像隐写多方损失平衡方案 | curriculum learning |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation | Eval3D:一种可解释的细粒度3D生成评估工具 | geometric consistency large language model foundation model |
🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding | ActionArt:提出细粒度人本视频理解的多模态大模型方法 | human-object interaction multimodal |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | RSRNav: Reasoning Spatial Relationship for Image-Goal Navigation | 提出RSRNav,通过推理空间关系解决图像目标导航中的方向信息缺失和视角不一致问题 | egocentric spatial relationship |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization | 提出COCO-Inpaint基准,用于图像修复篡改检测与定位研究。 | manipulation |