cs.CV(2025-08-29)

📊 共 19 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2) 支柱五:交互与反应 (Interaction & Reaction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 Multimodal Deep Learning for Phyllodes Tumor Classification from Ultrasound and Clinical Data 提出多模态深度学习框架以提高腺瘤分类准确性 multimodal
2 Foundation Model-Driven Classification of Atypical Mitotic Figures with Domain-Aware Training Strategies 提出基于基础模型的分类方法以解决非典型有丝分裂图像识别问题 foundation model
3 From Drone Imagery to Livability Mapping: AI-powered Environment Perception in Rural China 提出视觉-语言对比排名框架以解决农村环境感知问题 large language model multimodal chain-of-thought
4 MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning 提出MM-SeR以解决轻量级图像描述的可靠性问题 multimodal
5 Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety 提出Safe-LLaVA以解决多模态大语言模型的生物特征泄露问题 large language model multimodal
6 DriveQA: Passing the Driving Knowledge Test 提出DriveQA以解决驾驶知识测试的挑战 large language model multimodal
7 Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer 提出多模态融合方法以提升肾癌复发风险预测精度 foundation model multimodal
8 Generative AI for Industrial Contour Detection: A Language-Guided Vision System 提出语言引导的生成视觉系统以解决工业轮廓检测问题 multimodal
9 Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments 提出Waste-Bench以解决复杂环境下VLLMs评估问题 large language model
10 Domain Generalization in-the-Wild: Disentangling Classification from Domain-Aware Representations 提出CLIP-DCA以解决领域泛化评估中的挑战 foundation model
11 Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR 提出行级OCR以解决词级OCR的局限性 large language model
12 How Well Do Vision--Language Models Understand Cities? A Comparative Study on Spatial Reasoning from Street-View Images 提出城市空间推理新挑战以提升视觉语言模型性能 chain-of-thought

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
13 ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding 提出ELV-Halluc以解决长视频理解中的语义聚合幻觉问题 DPO large language model multimodal
14 Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment 提出VEME以解决动态环境中的推理与规划问题 world model scene understanding VLN
15 What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos 提出利用非典型视频提升开放世界学习的视觉表示能力 representation learning
16 UItron: Foundational GUI Agent with Advanced Perception and Planning 提出UItron以解决GUI代理自动化操作问题 reinforcement learning foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
17 Scale-GS: Efficient Scalable Gaussian Splatting via Redundancy-filtering Training on Streaming Content 提出可扩展高效的高斯点云渲染框架以解决动态场景训练问题 3D gaussian splatting 3DGS gaussian splatting
18 Complete Gaussian Splats from a Single Image with Denoising Diffusion Models 提出基于潜在扩散模型的单图像完整高斯点云重建方法 gaussian splatting splatting

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
19 ECHO: Ego-Centric modeling of Human-Object interactions 提出ECHO以解决人机交互建模的挑战 human-object interaction HOI egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页