cs.CV(2025-08-21)

📊 共 28 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (13 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱四:生成式动作 (Generative Motion) (1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (13 篇)

#题目一句话要点标签🔗
1 Cross-Attention Multimodal Fusion for Breast Cancer Diagnosis: Integrating Mammography and Clinical Data with Explainability 提出基于交叉注意力多模态融合的乳腺癌诊断方法,提升诊断精度与可解释性。 multimodal
2 Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping PathPT:通过少样本提示调优增强病理学基础模型,用于罕见癌症亚型分类 foundation model
3 Deep Learning-Driven Multimodal Detection and Movement Analysis of Objects in Culinary 提出基于深度学习的多模态融合烹饪对象检测与动作分析方法 multimodal
4 BasketLiDAR: The First LiDAR-Camera Multimodal Dataset for Professional Basketball MOT BasketLiDAR:首个篮球多目标跟踪LiDAR-相机多模态数据集与实时跟踪框架 multimodal
5 AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation AeroDuo:提出双无人机协同视觉语言导航框架,解决复杂环境下的无人机导航问题 VLN large language model multimodal
6 StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding StreamMem:面向流视频理解的查询无关KV缓存记忆机制 large language model multimodal
7 RCDINO: Enhancing Radar-Camera 3D Object Detection with DINOv2 Semantic Features RCDINO:利用DINOv2语义特征增强雷达-相机3D目标检测 foundation model multimodal
8 VT-LVLM-AR: A Video-Temporal Large Vision-Language Model Adapter for Fine-Grained Action Recognition in Long-Term Videos 提出VT-LVLM-AR,利用视频-时间大视觉语言模型适配器解决长时视频中的细粒度动作识别问题 large language model
9 SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass SceneGen:单图像前向传播的3D场景生成框架 embodied AI
10 LLM-empowered Dynamic Prompt Routing for Vision-Language Models Tuning under Long-Tailed Distributions 提出MDPR框架,解决长尾分布下VLM微调的偏差累积问题 large language model
11 When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding Grounded VideoDiT:融合扩散模型与实体感知的视频LLM,提升长视频理解能力 TAMP
12 First RAG, Second SEG: A Training-Free Paradigm for Camouflaged Object Detection 提出RAG-SEG免训练框架,解决伪装目标检测中prompt生成难题,实现高性能与高效率。 foundation model
13 Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent 提出Comp-X,利用专家驱动的LLM Agent实现智能交互式图像压缩。 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
14 Enhancing Novel View Synthesis from extremely sparse views with SfM-free 3D Gaussian Splatting Framework 提出一种无需SfM的3D高斯溅射框架,用于从极稀疏视图中增强新视角合成效果 3D gaussian splatting 3DGS gaussian splatting
15 Zero-shot Volumetric CT Super-Resolution using 3D Gaussian Splatting with Upsampled 2D X-ray Projection Priors 提出基于3D高斯溅射和X射线先验的零样本CT超分辨率方法 3D gaussian splatting gaussian splatting splatting
16 DriveSplat: Decoupled Driving Scene Reconstruction with Geometry-enhanced Partitioned Neural Gaussians DriveSplat:提出几何增强的分区神经高斯方法,用于解耦驾驶场景重建。 3D gaussian splatting gaussian splatting splatting
17 Image-Conditioned 3D Gaussian Splat Quantization 提出图像条件3D高斯量化器,实现3DGS场景的高效压缩与后归档编辑。 3D gaussian splatting 3DGS gaussian splatting
18 MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion MeSS:利用城市网格引导和跨视角一致性扩散生成室外场景 3D gaussian splatting 3DGS gaussian splatting
19 MeSS: City Mesh-Guided Outdoor Scene Generation with Cross-View Consistent Diffusion 提出MeSS以解决城市网格模型纹理缺乏问题 3D gaussian splatting 3DGS gaussian splatting

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
20 Task-Generalized Adaptive Cross-Domain Learning for Multimodal Image Fusion AdaSFFuse:面向多模态图像融合的自适应跨域学习框架,提升泛化能力。 Mamba multimodal
21 DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding DesignCLIP:利用CLIP模型进行设计专利理解的多模态学习框架 contrastive learning multimodal
22 From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations 研究MAE如何捕获空间相关性,揭示超参数与下游任务性能的关联。 masked autoencoder MAE foundation model
23 Adversarial Agent Behavior Learning in Autonomous Driving Using Deep Reinforcement Learning 提出基于深度强化学习的对抗性Agent行为学习方法,用于提升自动驾驶安全性 reinforcement learning deep reinforcement learning
24 MapKD: Unlocking Prior Knowledge with Cross-Modal Distillation for Efficient Online HD Map Construction 提出MapKD,通过跨模态知识蒸馏实现高效在线高清地图构建 distillation multimodal
25 Representation Learning with Adaptive Superpixel Coding 提出自监督Transformer模型ASC,通过自适应超像素编码提升表征学习能力 representation learning

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
26 Text-Driven 3D Hand Motion Generation from Sign Language Data 提出HandMDM,利用大规模手语数据实现文本驱动的3D手部动作生成。 motion diffusion model motion diffusion motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
27 MExECON: Multi-view Extended Explicit Clothed humans Optimized via Normal integration MExECON:基于法线整合的多视角扩展显式服装人体优化 SMPL SMPL-X

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
28 Reliable Multi-view 3D Reconstruction for `Just-in-time' Edge Environments 针对即时边缘环境,提出基于投资组合理论的多视角3D重建方法,提升系统在时空扰动下的可靠性。 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页