cs.CV(2026-04-30)
📊 共 43 篇论文 | 🔗 12 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (12 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (11 🔗4)
支柱二:RL算法与架构 (RL & Architecture) (10 🔗3)
支柱六:视频提取与匹配 (Video Extraction) (3 🔗1)
支柱一:机器人控制 (Robot Control) (2)
支柱七:动作重定向 (Motion Retargeting) (2)
支柱八:物理动画 (Physics-based Animation) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (11 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Faster 3D Gaussian Splatting Convergence via Structure-Aware Densification | 提出结构感知密度控制,加速3D高斯溅射收敛并提升重建质量 | 3D gaussian splatting gaussian splatting splatting | ||
| 14 | Sparse-View 3D Gaussian Splatting in the Wild | 提出一种稀疏视角下的3D高斯溅射方法,用于解决真实场景中的新视角合成问题。 | 3D gaussian splatting gaussian splatting splatting | ✅ | |
| 15 | Residual Gaussian Splatting for Ultra Sparse-View CBCT Reconstruction | 提出残差高斯溅射(RGS)用于超稀疏视角CBCT重建,提升细节保真度。 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 16 | Stop Holding Your Breath: CT-Informed Gaussian Splatting for Dynamic Bronchoscopy | 提出CT引导的高斯溅射方法,用于动态支气管镜检查,无需屏气。 | gaussian splatting splatting | ✅ | |
| 17 | TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions | 提出TransVLM,用于检测视频中任意类型的镜头过渡,解决传统方法对复杂过渡处理不足的问题。 | optical flow motion representation spatiotemporal | ✅ | |
| 18 | Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction | 提出TunnelMIND,通过视觉重校准和实体重建实现免训练的隧道缺陷检测与工程解读。 | open-vocabulary open vocabulary foundation model | ||
| 19 | 3D Reconstruction Techniques in the Manufacturing Domain: Applications, Research Opportunities and Use Cases | 综述制造领域3D重建技术,揭示应用、研究机遇与用例,填补统一框架的空白。 | 3D reconstruction | ||
| 20 | RayFormer: Modeling Inter- and Intra-Ray Similarity for NeRF-Based Video Snapshot Compressive Imaging | RayFormer:通过建模光线间和光线内相似性,提升NeRF视频快照压缩成像质量 | NeRF | ||
| 21 | Softmax-GS: Generalized Gaussians Learning When to Blend or Bound | 提出Softmax-GS以解决3D高斯重叠问题 | 3D gaussian splatting gaussian splatting splatting | ||
| 22 | VkSplat: High-Performance 3DGS Training in Vulkan Compute | VkSplat:基于Vulkan Compute的高性能3DGS训练框架 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 23 | REALM: An RGB and Event Aligned Latent Manifold for Cross-Modal Perception | REALM:提出RGB和事件对齐的潜在流形,实现跨模态感知 | depth estimation feature matching foundation model |
🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 24 | PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning | PRISM:通过黑盒策略蒸馏预对齐提升多模态强化学习性能 | reinforcement learning distillation multimodal | ✅ | |
| 25 | HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation | 提出HERMES++,统一3D场景理解与未来几何预测的自动驾驶世界模型 | world model world models scene understanding | ✅ | |
| 26 | Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling | 提出智能视觉生成五级分类法,推动视觉生成从原子映射向Agentic世界建模演进 | flow matching world model world models | ||
| 27 | Echo-α: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation | 提出Echo-α,用于超声图像解读的Agentic多模态推理模型 | reinforcement learning large language model multimodal | ✅ | |
| 28 | JI-ADF: Joint-Individual Learning with Adaptive Decision Fusion for Multimodal Skin Lesion Classification | 提出JI-ADF框架,融合多模态信息,提升皮肤病灶分类的准确性和临床实用性。 | representation learning multimodal | ||
| 29 | Leveraging Verifier-Based Reinforcement Learning in Image Editing | 提出Edit-R1框架,利用基于验证器的强化学习提升图像编辑效果 | reinforcement learning RLHF chain-of-thought | ||
| 30 | Action Motifs: Self-Supervised Hierarchical Representation of Human Body Movements | 提出A4Mer自监督学习框架,用于人体动作分层表示,提升行为建模性能。 | representation learning SMPL motion prediction | ||
| 31 | Generalizable Sparse-View 3D Reconstruction from Unconstrained Images | GenWildSplat:提出一种可泛化的稀疏视角三维重建框架,适用于无约束图像 | curriculum learning 3D reconstruction | ||
| 32 | Beyond Gaussian Bottlenecks: Topologically Aligned Encoding of Vision-Transformer Feature Spaces | 提出S²VAE,通过拓扑对齐编码Vision Transformer特征空间,提升三维重建效果。 | world model world models depth estimation | ||
| 33 | LA-Pose: Latent Action Pretraining Meets Pose Estimation | LA-Pose:利用潜在动作预训练提升相机位姿估计精度 | world model world models |
🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 34 | ResiHMR: Residual-Limb Aware Single-Image 3D Human Mesh Recovery for Individuals with Limb Loss | ResiHMR:针对肢体缺失个体的残肢感知单图3D人体网格重建 | human mesh recovery HMR | ||
| 35 | MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons | MoCapAnything V2:提出端到端运动捕捉框架,适用于任意骨骼动画生成。 | video-to-pose | ✅ | |
| 36 | Adaptive Geodesic Conformal Prediction for Egocentric Camera Pose Estimation | 提出DINOv2-Bridge自适应共形预测,提升以自我为中心的相机姿态估计不确定性覆盖率。 | egocentric |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 37 | Fake3DGS: A Benchmark for 3D Manipulation Detection in Neural Rendering | 提出Fake3DGS基准,用于评估神经渲染中3D篡改检测算法的性能。 | manipulation 3D gaussian splatting 3D reconstruction | ||
| 38 | SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation | SpaAct:通过空间激活的迁移学习和课程自适应提升视觉-语言导航性能 | locomotion curriculum learning VLN |
🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 39 | CasLayout: Cascaded 3D Layout Diffusion for Indoor Scene Synthesis with Implicit Relation Modeling | CasLayout:级联扩散模型,通过隐式关系建模实现室内场景合成 | spatial relationship large language model | ||
| 40 | 3D-ReGen: A Unified 3D Geometry Regeneration Framework | 提出3D-ReGen,通过可控的3D几何体再生框架实现3D对象增强、重建和编辑。 | geometric consistency |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 41 | YOSE: You Only Select Essential Tokens for Efficient DiT-based Video Object Removal | YOSE:提出一种高效的DiT视频对象移除框架,通过选择必要tokens显著加速推理。 | spatiotemporal diff-sim | ✅ | |
| 42 | MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video | MAEPose:基于毫米波视频的自监督时空人体姿态估计 | spatiotemporal |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 43 | Uni-HOI:A Unified framework for Learning the Joint distribution of Text and Human-Object Interaction | Uni-HOI:提出统一框架,学习文本与人-物交互的联合分布,实现多任务HOI生成与预测。 | motion generation VQ-VAE human-object interaction |