cs.CV(2025-08-26)

📊 共 20 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
1 Dual Enhancement on 3D Vision-Language Perception for Monocular 3D Visual Grounding 提出双重增强方法以解决单目3D视觉定位问题 visual grounding
2 Decouple, Reorganize, and Fuse: A Multimodal Framework for Cancer Survival Prediction 提出DeReF框架以解决癌症生存预测中的信息融合问题 multimodal
3 Beyond the Textual: Generating Coherent Visual Options for MCQs 提出跨模态选项合成框架以生成视觉选项的多项选择题 multimodal chain-of-thought
4 Autoregressive Universal Video Segmentation Model 提出自回归通用视频分割模型以解决无提示分割问题 foundation model
5 Event-Enriched Image Analysis Grand Challenge at ACM Multimedia 2025 提出EVENTA挑战以解决事件级多模态理解问题 multimodal
6 OwlCap: Harmonizing Motion-Detail for Video Captioning via HMD-270K and Caption Set Equivalence Reward 提出OwlCap以解决视频字幕生成中的运动细节不平衡问题 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
7 ColorGS: High-fidelity Surgical Scene Reconstruction with Colored Gaussian Splatting 提出ColorGS以解决内窥镜视频中组织重建的色彩与变形建模问题 3D gaussian splatting 3DGS gaussian splatting
8 Can we make NeRF-based visual localization privacy-preserving? 提出ppNeSF以解决NeRF视觉定位中的隐私问题 NeRF
9 PseudoMapTrainer: Learning Online Mapping without HD Maps 提出PseudoMapTrainer以解决在线地图训练依赖高清地图的问题 gaussian splatting splatting
10 SoccerNet 2025 Challenges Results SoccerNet 2025挑战推动足球视频理解研究进展 depth estimation monocular depth
11 Robust and Label-Efficient Deep Waste Detection 提出基于集成的半监督学习框架以提升废物检测效率 open-vocabulary open vocabulary

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
12 MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation 提出MIDAS框架以解决实时多模态交互数字人合成问题 world model large language model multimodal
13 Geo2Vec: Shape- and Distance-Aware Neural Representation of Geospatial Entities 提出Geo2Vec以解决地理实体表示学习中的高计算成本问题 representation learning spatial relationship
14 Flatness-aware Curriculum Learning via Adversarial Difficulty 提出对抗性难度度量以解决课程学习与平坦最小值结合问题 curriculum learning
15 Clustering-based Feature Representation Learning for Oracle Bone Inscriptions Detection 提出基于聚类的特征表示学习方法以解决甲骨文检测问题 representation learning

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
16 Rethinking Human-Object Interaction Evaluation for both Vision-Language Models and HOI-Specific Methods 提出新基准数据集以评估人机交互检测方法的有效性 human-object interaction HOI
17 DQEN: Dual Query Enhancement Network for DETR-based HOI Detection 提出双查询增强网络以解决DETR基础的HOI检测问题 human-object interaction HOI

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
18 OmniHuman-1.5: Instilling an Active Mind in Avatars via Cognitive Simulation 提出OmniHuman-1.5以解决视频化身动画的情感表达问题 physically plausible character animation large language model

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
19 All-in-One Slider for Attribute Manipulation in Diffusion Models 提出全能滑块以解决生成图像属性操控难题 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
20 Wan-S2V: Audio-Driven Cinematic Video Generation 提出Wan-S2V以解决复杂影视动画生成问题 character animation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页