cs.CV(2025-08-31)

📊 共 21 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (2) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 SWAGSplatting: Semantic-guided Water-scene Augmented Gaussian Splatting 提出SWAGSplatting以解决水下环境3D重建问题 3D gaussian splatting gaussian splatting splatting
2 GS-TG: 3D Gaussian Splatting Accelerator with Tile Grouping for Reducing Redundant Sorting while Preserving Rasterization Efficiency 提出GS-TG以解决3D Gaussian Splatting渲染速度不足问题 3D gaussian splatting gaussian splatting splatting
3 MarkSplatter: Generalizable Watermarking for 3D Gaussian Splatting Model via Splatter Image Structure 提出MarkSplatter以解决3D Gaussian Splatting模型的水印保护问题 3D gaussian splatting 3DGS gaussian splatting
4 Towards Integrating Multi-Spectral Imaging with Gaussian Splatting 提出多光谱成像与高斯点云融合以提升3D重建质量 3D gaussian splatting 3DGS gaussian splatting
5 UPGS: Unified Pose-aware Gaussian Splatting for Dynamic Scene Deblurring 提出统一姿态感知高斯点云以解决动态场景去模糊问题 3DGS gaussian splatting splatting
6 ER-LoRA: Effective-Rank Guided Adaptation for Weather-Generalized Depth Estimation 提出ER-LoRA以解决恶劣天气下深度估计问题 depth estimation monocular depth foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
7 CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote Sensing Image Classification 提出CSFMamba以解决多模态遥感图像分类中的计算复杂性问题 Mamba SSM state space model
8 OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving 提出OmniReason框架以解决自动驾驶中的时空推理问题 distillation scene understanding spatiotemporal
9 MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation 提出MV-SSM框架以解决多视角3D人体姿态估计问题 Mamba SSM state space model
10 Multi-Level CLS Token Fusion for Contrastive Learning in Endoscopy Image Classification 提出多层CLS Token融合以解决内窥镜图像分类问题 contrastive learning multimodal
11 LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model 提出LLaVA-Critic-R1以优化多模态生成与评估 reinforcement learning multimodal
12 CascadeFormer: A Family of Two-stage Cascading Transformers for Skeleton-based Human Action Recognition 提出CascadeFormer以解决骨架基础的人类动作识别问题 representation learning spatiotemporal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
13 Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering 提出MI-RAG框架以解决知识密集型视觉问答中的知识获取问题 large language model multimodal
14 Fusion to Enhance: Fusion Visual Encoder to Enhance Multimodal Language Model 提出Fusion to Enhance以解决多模态语言模型的视觉感知瓶颈问题 large language model multimodal
15 Ultrasound-based detection and malignancy prediction of breast lesions eligible for biopsy: A multi-center clinical-scenario study using nomograms, large language models, and radiologist evaluation 提出综合超声nomogram以提高乳腺病变活检推荐准确性 large language model
16 Image-to-Brain Signal Generation for Visual Prosthesis with CLIP Guided Multimodal Diffusion Models 提出图像到脑信号生成框架以解决视觉假体的编码问题 multimodal
17 EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions 提出EVENT-Retriever以解决基于事件的多模态图像检索问题 multimodal
18 Prompt the Unseen: Evaluating Visual-Language Alignment Beyond Supervision 提出新基准以评估视觉语言模型的投影层泛化能力 large language model multimodal

🔬 支柱五:交互与反应 (Interaction & Reaction) (2 篇)

#题目一句话要点标签🔗
19 No More Sibling Rivalry: Debiasing Human-Object Interaction Detection 提出新方法以解决人机交互检测中的偏见问题 human-object interaction HOI
20 Secure and Scalable Face Retrieval via Cancelable Product Quantization 提出可取消的产品量化以解决人脸检索隐私问题 OMOMO

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 InterPose: Learning to Generate Human-Object Interactions from Large-Scale Web Videos 提出InterPose以解决复杂场景中人机交互生成问题 manipulation motion generation human-object interaction

⬅️ 返回 cs.CV 首页 · 🏠 返回主页