cs.CV(2025-10-17)

📊 共 23 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (8 🔗2) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱九:具身大模型 (Embodied Foundation Models) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 PFGS: Pose-Fused 3D Gaussian Splatting for Complete Multi-Pose Object Reconstruction 提出PFGS,通过姿态融合3D高斯溅射实现完整的多姿态物体重建 3D gaussian splatting 3DGS gaussian splatting
2 MARIS: Marine Open-Vocabulary Instance Segmentation with Geometric Enhancement and Semantic Alignment 提出MARIS水下开放词汇实例分割基准,并设计GPEM和SAIM模块提升分割性能。 open-vocabulary open vocabulary
3 H2OFlow: Grounding Human-Object Affordances with 3D Generative Models and Dense Diffused Flows H2OFlow:利用3D生成模型和稠密扩散流学习人-物交互行为 affordance human-object interaction HOI
4 Proactive Scene Decomposition and Reconstruction 提出主动场景分解与重建方法,利用人机交互动态优化场景理解。 gaussian splatting splatting human-object interaction
5 A Novel Combined Optical Flow Approach for Comprehensive Micro-Expression Recognition 提出结合起始到峰值与峰值到结束阶段光流的微表情识别方法 optical flow
6 DriveGen3D: Boosting Feed-Forward Driving Scene Generation with Efficient Video Diffusion DriveGen3D:通过高效视频扩散加速前馈式驾驶场景生成 scene reconstruction multimodal
7 Neuro-Symbolic Spatial Reasoning in Segmentation 提出RelateSeg,通过神经符号空间推理提升开放词汇语义分割性能 open-vocabulary open vocabulary
8 SANR: Scene-Aware Neural Representation for Light Field Image Compression with Rate-Distortion Optimization 提出SANR:一种场景感知神经表示光场图像压缩框架,实现率失真优化。 scene reconstruction
9 DGME-T: Directional Grid Motion Encoding for Transformer-Based Historical Camera Movement Classification 提出DGME-T,通过方向网格运动编码增强Transformer在历史影像镜头运动分类中的鲁棒性 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
10 BLIP3o-NEXT: Next Frontier of Native Image Generation BLIP3o-NEXT:原生图像生成的新前沿,统一文本到图像生成与图像编辑 reinforcement learning foundation model multimodal
11 UniMedVL: Unifying Medical Multimodal Understanding And Generation Through Observation-Knowledge-Analysis 提出UniMedVL,统一医学多模态理解与生成,提升医疗诊断应用性能。 curriculum learning multimodal
12 Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation 提出CoMe:通过层拼接压缩大语言模型,在显著剪枝的同时保持性能。 distillation large language model
13 VM-BeautyNet: A Synergistic Ensemble of Vision Transformer and Mamba for Facial Beauty Prediction VM-BeautyNet:融合Vision Transformer与Mamba的面部美学预测模型 Mamba MAE spatial relationship
14 Cortical-SSM: A Deep State Space Model for EEG and ECoG Motor Imagery Decoding 提出Cortical-SSM,利用深度状态空间模型解码脑电和皮层脑电运动想象信号 SSM state space model
15 StretchySnake: Flexible SSM Training Unlocks Action Recognition Across Spatio-Temporal Scales StretchySnake:灵活的SSM训练解锁跨时空尺度的动作识别 SSM state space model
16 Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset 提出Ditto框架,通过高质量合成数据集Editto-1M,显著提升指令驱动的视频编辑能力。 curriculum learning instruction following
17 Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning 提出基于证据优先的自适应框架EARL,解决视频LLM长视频推理中信息稀释问题。 reinforcement learning large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
18 Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation Imaginarium:提出视觉引导的高质量3D场景布局生成方法,提升场景丰富度和质量。 spatial relationship large language model
19 MRASfM: Multi-Camera Reconstruction and Aggregation through Structure-from-Motion in Driving Scenes MRASfM:提出多相机SfM框架,解决自动驾驶场景重建难题。 spatial relationship

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
20 Embody 3D: A Large-scale Multimodal Motion and Behavior Dataset Meta发布Embody 3D:大规模多模态人体运动与行为数据集 locomotion multimodal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (1 篇)

#题目一句话要点标签🔗
21 Towards Label-Free Brain Tumor Segmentation: Unsupervised Learning with Multimodal MRI 提出基于多模态MRI的无监督脑肿瘤分割方法,解决标注数据稀缺问题。 multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
22 Aria Gen 2 Pilot Dataset 发布Aria Gen 2 Pilot Dataset,用于可穿戴设备的第一人称视角多模态感知研究 egocentric multimodal

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
23 LILAC: Long-sequence Incremental Low-latency Arbitrary Motion Stylization via Streaming VAE-Diffusion with Causal Decoding LILAC:基于流式VAE-Diffusion和因果解码的长序列增量低延迟任意动作风格化 character control

⬅️ 返回 cs.CV 首页 · 🏠 返回主页