cs.CV(2025-12-24)

📊 共 25 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (8 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱一:机器人控制 (Robot Control) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 PanoGrounder: Bridging 2D and 3D with Panoramic Scene Representations for VLM-based 3D Visual Grounding PanoGrounder:利用全景场景表示桥接2D和3D,实现基于VLM的3D视觉定位 visual grounding
2 Streaming Video Instruction Tuning 提出Streamo,一个用于实时流视频理解的通用交互式助手。 multimodal instruction following
3 TGC-Net: A Structure-Aware and Semantically-Aligned Framework for Text-Guided Medical Image Segmentation 提出TGC-Net以解决医学图像分割中的文本引导问题 large language model multimodal
4 Fast SAM2 with Text-Driven Token Pruning 提出基于文本驱动的token剪枝Fast SAM2,加速视频分割并降低资源消耗。 foundation model
5 Leveraging Lightweight Entity Extraction for Scalable Event-Based Image Retrieval 提出基于事件中心实体提取的两阶段图像检索方法,提升复杂场景下的检索精度。 multimodal
6 Latent Implicit Visual Reasoning 提出Latent Implicit Visual Reasoning,无需显式监督即可提升LMMs的视觉推理能力。 multimodal
7 T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation 提出T2AV-Compass,用于统一评估文本到音视频生成模型的性能 instruction following
8 Reasoning-Driven Amodal Completion: Collaborative Agents and Perceptual Evaluation 提出基于协同多智能体推理的非模态补全框架,解决语义一致性和结构完整性问题 chain-of-thought
9 Beyond Weight Adaptation: Feature-Space Domain Injection for Cross-Modal Ship Re-Identification 提出领域表示注入方法以解决跨模态船舶再识别问题 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
10 Multimodal Skeleton-Based Action Representation Learning via Decomposition and Composition 提出分解与组合的多模态骨骼动作表示学习框架,提升效率与性能 representation learning multimodal
11 SegMo: Segment-aligned Text to 3D Human Motion Generation 提出SegMo框架,通过对齐文本和运动片段实现更精细的文本驱动3D人体动作生成。 contrastive learning motion generation
12 Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential 提出SpikeSurgSeg,一种基于脉冲神经网络的视频Transformer,用于实时手术场景分割。 representation learning scene understanding spatiotemporal
13 TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning TICON:一种用于组织病理学表征学习的切片级上下文建模方法 representation learning foundation model
14 Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations NExT-Vid:提出基于下一帧预测的自回归视频建模框架,提升视频表征学习效果。 flow matching representation learning visual pre-training
15 Self-supervised Multiplex Consensus Mamba for General Image Fusion 提出SMC-Mamba框架,用于通用图像融合,提升多种融合任务性能。 Mamba contrastive learning
16 PUFM++: Point Cloud Upsampling via Enhanced Flow Matching 提出PUFM++以解决稀疏点云上采样问题 flow matching
17 XGrid-Mapping: Explicit Implicit Hybrid Grid Submaps for Efficient Incremental Neural LiDAR Mapping 提出XGrid-Mapping,利用显隐混合网格子图实现高效增量式神经激光雷达建图 distillation implicit representation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
18 Quantile Rendering: Efficiently Embedding High-dimensional Feature on 3D Gaussian Splatting 提出Quantile Rendering,高效嵌入高维特征于3D高斯溅射,提升开放词汇分割性能。 3D gaussian splatting gaussian splatting splatting
19 ORCA: Object Recognition and Comprehension for Archiving Marine Species ORCA:用于海洋物种存档的目标识别与理解多模态基准 open-vocabulary open vocabulary visual grounding
20 Optical Flow-Guided 6DoF Object Pose Tracking with an Event Camera 提出基于光流引导的事件相机6DoF物体姿态跟踪方法 optical flow
21 Towards Arbitrary Motion Completing via Hierarchical Continuous Representation 提出基于分层连续表示的NAME框架,实现任意帧率的运动补全 implicit representation
22 UniPR-3D: Towards Universal Visual Place Recognition with Visual Geometry Grounded Transformer 提出UniPR-3D,利用视觉几何Transformer实现通用视觉定位识别。 VGGT

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
23 DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy Prediction 提出双图时空注意力网络以提高肺结节恶性预测准确性 mutual attention spatiotemporal multimodal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
24 Human Motion Estimation with Everyday Wearables EveryWear:利用日常可穿戴设备进行轻量级、免标定的全身人体运动估计 sim-to-real teacher-student egocentric

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
25 ACD: Direct Conditional Control for Video Diffusion Models via Attention Supervision ACD:通过注意力监督实现视频扩散模型中的直接条件控制 classifier-free guidance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页