cs.CV(2025-05-12)

📊 共 23 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Vision Foundation Model Embedding-Based Semantic Anomaly Detection 提出基于视觉基础模型嵌入的语义异常检测框架,用于提升自动驾驶系统的安全性。 foundation model
2 Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning 提出Skywork-VL Reward,用于提升多模态理解和推理任务的奖励模型。 multimodal
3 Visually Interpretable Subtask Reasoning for Visual Question Answering VISTAR:通过视觉可解释的子任务推理提升视觉问答能力 large language model multimodal
4 Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning 提出Re-Critic框架,通过增强推理链缓解多模态大模型中的幻觉问题 multimodal chain-of-thought
5 Gameplay Highlights Generation 提出基于微调X-CLIP的多模态游戏精彩片段自动生成方法,无需游戏引擎集成或OCR。 multimodal
6 Self-Supervised Event Representations: Towards Accurate, Real-Time Perception on SoC FPGAs 提出自监督事件表示方法,实现片上FPGA的精确、实时事件相机感知 TAMP
7 Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs 提出基于图神经网络的文档布局分析方法,提升公共事务文档理解精度。 multimodal
8 L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers 提出L-SWAG以解决零成本神经架构搜索在视觉变换器中的应用问题 large language model
9 Synthetic Similarity Search in Automotive Production 提出基于合成数据的相似性搜索方案,用于汽车生产中的视觉质量检测。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
10 TUM2TWIN: Introducing the Large-Scale Multimodal Urban Digital Twin Benchmark Dataset TUM2TWIN:大规模多模态城市数字孪生基准数据集 gaussian splatting splatting NeRF
11 SLAG: Scalable Language-Augmented Gaussian Splatting SLAG:一种可扩展的语言增强高斯溅射方法,用于快速嵌入大型场景。 gaussian splatting splatting
12 TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian TUGS:基于张量化高斯和物理的水下场景紧凑表示方法 gaussian splatting splatting NeRF
13 GIFStream: 4D Gaussian-based Immersive Video with Feature Stream GIFStream:提出基于特征流的4D高斯模型,用于高质量沉浸式视频的表示与压缩。 gaussian splatting splatting
14 Geometric Prior-Guided Neural Implicit Surface Reconstruction in the Wild 提出几何先验引导的神经隐式表面重建方法,解决野外场景重建难题。 NeRF neural radiance field
15 Asynchronous Multi-Object Tracking with an Event Camera 提出异步事件多目标跟踪算法以解决动态环境中的目标检测问题 optical flow
16 Deep Learning Advances in Vision-Based Traffic Accident Anticipation: A Comprehensive Review of Methods, Datasets, and Future Directions 综述基于深度学习的视觉交通安全事故预测方法、数据集与未来方向 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
17 SAMChat: Introducing Chain of Thought Reasoning and GRPO to a Multimodal Small Language Model for Small Scale Remote Sensing 提出SAMChat,结合CoT推理与GRPO,提升小规模遥感图像分析能力。 reinforcement learning large language model multimodal
18 Learning to Reason and Navigate: Parameter Efficient Action Planning with Large Language Models 提出PEAP-LLM以解决复杂环境中的导航与定位问题 DPO direct preference optimization large language model
19 DanceGRPO: Unleashing GRPO on Visual Generation DanceGRPO:利用GRPO解决视觉生成中与人类偏好对齐的难题。 reinforcement learning RLHF foundation model
20 RealRep: Generalized SDR-to-HDR Conversion via Attribute-Disentangled Representation Learning 提出RealRep框架,通过解耦表征学习实现SDR到HDR的通用转换。 representation learning

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
21 Human Motion Prediction via Test-domain-aware Adaptation with Easily-available Human Motions Estimated from Videos 提出基于视频估计人体运动的领域自适应方法,提升人体运动预测性能 human motion

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
22 Boosting Global-Local Feature Matching via Anomaly Synthesis for Multi-Class Point Cloud Anomaly Detection 提出基于异常合成和全局-局部特征匹配的GLFM方法,用于多类别点云异常检测。 feature matching

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
23 Hybrid Spiking Vision Transformer for Object Detection with Event Cameras 提出混合脉冲视觉Transformer(HsVT)模型,用于提升事件相机下的目标检测性能。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页