cs.CV(2026-02-19)
📊 共 21 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2)
支柱二:RL算法与架构 (RL & Architecture) (3 🔗1)
支柱一:机器人控制 (Robot Control) (1)
支柱四:生成式动作 (Generative Motion) (1 🔗1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | 3D Scene Rendering with Multimodal Gaussian Splatting | 提出基于多模态高斯溅射的3D场景渲染方法,提升恶劣环境下的重建质量。 | 3D gaussian splatting gaussian splatting splatting | ||
| 2 | NRGS-SLAM: Monocular Non-Rigid SLAM for Endoscopy via Deformation-Aware 3D Gaussian Splatting | NRGS-SLAM:基于形变感知3D高斯溅射的内窥镜单目非刚性SLAM | 3D gaussian splatting gaussian splatting splatting | ||
| 3 | B$^3$-Seg: Camera-Free, Training-Free 3DGS Segmentation via Analytic EIG and Beta-Bernoulli Bayesian Updates | B$^3$-Seg:无需相机、无需训练,基于解析EIG和Beta-Bernoulli贝叶斯更新的3DGS分割 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 4 | Cholec80-port: A Geometrically Consistent Trocar Port Segmentation Dataset for Robust Surgical Scene Understanding | 提出几何一致的Cholec80-port数据集,提升手术场景理解的鲁棒性 | visual SLAM scene understanding geometric consistency | ||
| 5 | 4D Monocular Surgical Reconstruction under Arbitrary Camera Motions | 提出Local-EndoGS,解决任意相机运动下单目内窥镜手术场景的4D重建问题 | monocular depth stereo depth 3D gaussian splatting | ✅ | |
| 6 | Neural Implicit Representations for 3D Synthetic Aperture Radar Imaging | 提出基于神经隐式表示的3D合成孔径雷达成像方法,解决稀疏采样下的重建伪影问题。 | implicit representation | ||
| 7 | Inferring Height from Earth Embeddings: First insights using Google AlphaEarth | 利用AlphaEarth嵌入,结合深度学习回归模型,实现区域地表高度精确映射。 | height map multimodal | ||
| 8 | IntRec: Intent-based Retrieval with Contrastive Refinement | 提出IntRec交互式目标检索框架,通过对比精炼用户意图提升复杂场景下的检索精度。 | open-vocabulary open vocabulary |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning | BadCLIP++:提出隐蔽且持久的多模态对比学习后门攻击框架 | contrastive learning multimodal | ||
| 17 | SpectralGCD: Spectral Concept Selection and Cross-modal Representation Learning for Generalized Category Discovery | 提出SpectralGCD,利用谱概念选择和跨模态表示学习解决广义类别发现问题。 | representation learning distillation multimodal | ✅ | |
| 18 | RetouchIQ: MLLM Agents for Instruction-Based Image Retouching with Generalist Reward | RetouchIQ:基于通用奖励的MLLM智能体,用于指令驱动的图像修饰 | reinforcement learning large language model multimodal |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 19 | Leveraging Contrastive Learning for a Similarity-Guided Tampered Document Data Generation Pipeline | 提出一种基于对比学习和相似性引导的篡改文档数据生成流程,提升篡改检测模型性能。 | manipulation contrastive learning |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | PartRAG: Retrieval-Augmented Part-Level 3D Generation and Editing | PartRAG:提出检索增强的部件级3D生成与编辑框架,提升生成质量和编辑能力。 | physically plausible | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 21 | EA-Swin: An Embedding-Agnostic Swin Transformer for AI-Generated Video Detection | 提出EA-Swin,用于提升AI生成视频检测的泛化性和准确性 | spatiotemporal |