cs.CV(2025-12-23)

📊 共 17 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱一:机器人控制 (Robot Control) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 NULLBUS: Multimodal Mixed-Supervision for Breast Ultrasound Segmentation via Nullable Global-Local Prompts NullBUS:通过可空全局-局部提示的多模态混合监督乳腺超声分割 multimodal
2 FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models FlashVLM:文本引导的视觉Token选择,提升大模型多模态效率 multimodal
3 VideoScaffold: Elastic-Scale Visual Hierarchies for Streaming Video Understanding in MLLMs VideoScaffold:面向MLLM的弹性尺度视觉层级,用于流式视频理解 large language model multimodal
4 Input-Adaptive Visual Preprocessing for Efficient Fast Vision-Language Model Inference 提出输入自适应视觉预处理方法,提升FastVLM在视觉问答任务中的推理效率。 multimodal
5 SpatialTree: How Spatial Abilities Branch Out in MLLMs 构建SpatialTree,系统评估并提升MLLM的空间认知能力 multimodal
6 Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models 提出DSR Suite和几何选择模块GSM,提升VLM在动态空间推理能力 foundation model
7 Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark 提出NL-DIR基准数据集,用于解决自然语言描述的文档图像检索问题 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
8 Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS) 利用3D高斯溅射增强5D苹果姿态估计的标注效率 3D gaussian splatting 3DGS gaussian splatting
9 SirenPose: Dynamic Scene Reconstruction via Geometric Supervision SirenPose:通过几何监督实现动态场景的精确重建与时序一致性 scene reconstruction physically plausible spatiotemporal
10 AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment AlignPose:基于多视角特征度量对齐的通用6D位姿估计 6D pose estimation
11 SmartSplat: Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images SmartSplat:提出特征感知的GS图像压缩框架,解决超高分辨率图像的高效压缩与高质量重建问题。 3D gaussian splatting gaussian splatting splatting

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
12 Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition 提出多模态对齐、翻译、融合与迁移方法,提升复杂输入理解与识别能力 distillation egocentric multimodal
13 Active Intelligence in Video Avatars via Closed-loop World Modeling 提出ORCA框架,通过闭环世界建模实现视频化身的主动智能 world model
14 Multi-objective hybrid knowledge distillation for efficient deep learning in smart agriculture 提出面向智慧农业的高效深度学习多目标混合知识蒸馏框架 distillation

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
15 Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding 融合物理力信息的运动理解:提升步态、动作识别与视频描述性能 human motion

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving LEAD:最小化端到端驾驶中学习器-专家不对称性,提升CARLA模拟器驾驶性能 sim-to-real imitation learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
17 DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning 提出DETACH框架,通过解耦时空对齐解决外中心视频与环境传感器融合问题 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页