cs.CV(2025-06-04)

📊 共 17 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (3) 支柱二:RL算法与架构 (RL & Architecture) (2) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting 提出FlexGS以解决3D高斯点云渲染内存限制问题 3D gaussian splatting 3DGS gaussian splatting
2 Photoreal Scene Reconstruction from an Egocentric Device 提出视觉惯性束调整以解决滚动快门相机重建问题 gaussian splatting splatting scene reconstruction
3 HuGeDiff: 3D Human Generation via Diffusion with Gaussian Splatting 提出HuGeDiff以解决3D人类生成的控制与细节问题 gaussian splatting splatting neural radiance field
4 UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation 提出UniCUE框架以解决中文手语视频到语音生成问题 semantic mapping semantic map multimodal
5 Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation 提出Voyager以解决长距离一致性3D场景生成问题 metric depth
6 GlobalBuildingAtlas: An Open Global and Complete Dataset of Building Polygons, Heights and LoD1 3D Models 提出GlobalBuildingAtlas以解决全球建筑数据缺乏问题 height map

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
7 MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos 提出MMR-V以解决多模态视频推理的挑战 large language model multimodal chain-of-thought
8 Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization 提出实体中心多模态偏好优化以解决大视觉语言模型的幻觉问题 large language model multimodal
9 Rex-Thinker: Grounded Object Referring via Chain-of-Thought Reasoning 提出Rex-Thinker以解决对象指称的可解释性与可靠性问题 chain-of-thought
10 ReXVQA: A Large-scale Visual Question Answering Benchmark for Generalist Chest X-ray Understanding 提出ReXVQA以解决胸部X光视觉问答基准问题 large language model multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)

#题目一句话要点标签🔗
11 Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs 提出Struct2D框架以解决MLLMs空间推理问题 egocentric large language model multimodal
12 Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset 提出Oxford Day-and-Night数据集以解决夜间视觉重定位问题 egocentric
13 SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing 提出SAVVY以解决动态3D空间推理问题 egocentric large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
14 Language-Image Alignment with Fixed Text Encoders 提出LIFT方法以简化语言-图像对齐过程 representation learning contrastive learning large language model
15 Object-level Self-Distillation for Vision Pretraining 提出对象级自蒸馏方法以解决图像级自蒸馏局限性 distillation

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
16 Zero-Shot Temporal Interaction Localization for Egocentric Videos 提出EgoLoc以解决自我中心视频中的时序交互定位问题 human-object interaction HOI egocentric

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
17 WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning 提出WorldPrediction基准以解决高层次世界建模与长远规划问题 motion planning world model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页