cs.CV（2025-08-29）

📊 共 19 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (12 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (2) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Multimodal Deep Learning for Phyllodes Tumor Classification from Ultrasound and Clinical Data	提出多模态深度学习框架以提高腺瘤分类准确性	multimodal
2	Foundation Model-Driven Classification of Atypical Mitotic Figures with Domain-Aware Training Strategies	提出基于基础模型的分类方法以解决非典型有丝分裂图像识别问题	foundation model
3	From Drone Imagery to Livability Mapping: AI-powered Environment Perception in Rural China	提出视觉-语言对比排名框架以解决农村环境感知问题	large language model multimodal chain-of-thought
4	MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning	提出MM-SeR以解决轻量级图像描述的可靠性问题	multimodal
5	Safe-LLaVA: A Privacy-Preserving Vision-Language Dataset and Benchmark for Biometric Safety	提出Safe-LLaVA以解决多模态大语言模型的生物特征泄露问题	large language model multimodal
6	DriveQA: Passing the Driving Knowledge Test	提出DriveQA以解决驾驶知识测试的挑战	large language model multimodal
7	Integrating Pathology and CT Imaging for Personalized Recurrence Risk Prediction in Renal Cancer	提出多模态融合方法以提升肾癌复发风险预测精度	foundation model multimodal
8	Generative AI for Industrial Contour Detection: A Language-Guided Vision System	提出语言引导的生成视觉系统以解决工业轮廓检测问题	multimodal
9	Waste-Bench: A Comprehensive Benchmark for Evaluating VLLMs in Cluttered Environments	提出Waste-Bench以解决复杂环境下VLLMs评估问题	large language model
10	Domain Generalization in-the-Wild: Disentangling Classification from Domain-Aware Representations	提出CLIP-DCA以解决领域泛化评估中的挑战	foundation model
11	Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR	提出行级OCR以解决词级OCR的局限性	large language model	✅
12	How Well Do Vision--Language Models Understand Cities? A Comparative Study on Spatial Reasoning from Street-View Images	提出城市空间推理新挑战以提升视觉语言模型性能	chain-of-thought

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
13	ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding	提出ELV-Halluc以解决长视频理解中的语义聚合幻觉问题	DPO large language model multimodal
14	Beyond Pixels: Introducing Geometric-Semantic World Priors for Video-based Embodied Models via Spatio-temporal Alignment	提出VEME以解决动态环境中的推理与规划问题	world model scene understanding VLN
15	What Can We Learn from Harry Potter? An Exploratory Study of Visual Representation Learning from Atypical Videos	提出利用非典型视频提升开放世界学习的视觉表示能力	representation learning	✅
16	UItron: Foundational GUI Agent with Advanced Perception and Planning	提出UItron以解决GUI代理自动化操作问题	reinforcement learning foundation model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Scale-GS: Efficient Scalable Gaussian Splatting via Redundancy-filtering Training on Streaming Content	提出可扩展高效的高斯点云渲染框架以解决动态场景训练问题	3D gaussian splatting 3DGS gaussian splatting
18	Complete Gaussian Splats from a Single Image with Denoising Diffusion Models	提出基于潜在扩散模型的单图像完整高斯点云重建方法	gaussian splatting splatting

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	ECHO: Ego-Centric modeling of Human-Object interactions	提出ECHO以解决人机交互建模的挑战	human-object interaction HOI egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页