cs.CV(2025-10-25)

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱九:具身大模型 (Embodied Foundation Models) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
1 Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis 提出一种交叉增强的多模态融合框架,用于眼动追踪和面部特征的阿尔茨海默病诊断。 representation learning multimodal
2 GRPO-Guard: Mitigating Implicit Over-Optimization in Flow Matching via Regulated Clipping GRPO-Guard:通过调节裁剪缓解Flow Matching中的隐式过度优化 reinforcement learning PPO flow matching
3 CityRiSE: Reasoning Urban Socio-Economic Status in Vision-Language Models via Reinforcement Learning 提出CityRiSE,利用强化学习提升视觉-语言模型在城市社会经济地位推理中的能力 reinforcement learning reward design
4 LOC: A General Language-Guided Framework for Open-Set 3D Occupancy Prediction LOC:一种通用的语言引导框架,用于开放集3D occupancy预测 contrastive learning distillation scene understanding
5 Beyond Augmentation: Leveraging Inter-Instance Relation in Self-Supervised Representation Learning 提出基于图神经网络的自监督学习方法,利用实例间关系提升表征质量 representation learning
6 LongCat-Video Technical Report LongCat-Video:基于扩散Transformer的高效长视频生成模型 RLHF world model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
7 EndoSfM3D: Learning to 3D Reconstruct Any Endoscopic Surgery Scene using Self-supervised Foundation Model EndoSfM3D:利用自监督基础模型学习内窥镜手术场景的3D重建 depth estimation monocular depth Depth Anything
8 I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions I2-NeRF:提出一种物理可信的神经辐射场,增强介质退化下的三维重建。 NeRF neural radiance field
9 CogStereo: Neural Stereo Matching with Implicit Spatial Cognition Embedding CogStereo:利用隐式空间认知嵌入的神经立体匹配,提升零样本泛化能力。 monocular depth scene understanding scene flow
10 DynamicTree: Interactive Real Tree Animation via Sparse Voxel Spectrum DynamicTree:利用稀疏体素谱实现交互式真实树木动画 3DGS gaussian splatting splatting
11 STG-Avatar: Animatable Human Avatars via Spacetime Gaussian 提出STG-Avatar,通过时空高斯优化实现高保真可动画人体化身重建 3DGS optical flow

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
12 Benchmarking Egocentric Multimodal Goal Inference for Assistive Wearable Agents WAGIBench:用于辅助可穿戴代理的自中心多模态目标推断基准 egocentric multimodal
13 egoEMOTION: Egocentric Vision and Physiological Signals for Emotion and Personality Recognition in Real-World Tasks egoEMOTION:结合第一人称视觉与生理信号的情感与人格识别数据集 egocentric egocentric vision

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
14 Mitigating Coordinate Prediction Bias from Positional Encoding Failures 针对MLLM坐标预测偏差,提出Vision-PE Shuffle Guidance方法提升定位精度 large language model multimodal
15 HARMONY: Hidden Activation Representations and Model Output-Aware Uncertainty Estimation for Vision-Language Models 提出HARMONY,利用隐层激活和模型输出来提升视觉-语言模型的不确定性估计。 multimodal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
16 MOGRAS: Human Motion with Grasping in 3D Scenes MOGRAS:提出大规模3D场景中人体抓取交互运动数据集与基准方法。 physically plausible human-scene interaction

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
17 GRAID: Enhancing Spatial Reasoning of VLMs Through High-Fidelity Data Generation GRAID:通过高质量数据生成增强视觉语言模型空间推理能力 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页