cs.CV(2025-08-13)

📊 共 44 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (17 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (12 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱一:机器人控制 (Robot Control) (3 🔗2) 支柱八:物理动画 (Physics-based Animation) (3) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (17 篇)

#题目一句话要点标签🔗
1 Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations 提出多模态大语言模型以增强视频推荐系统的语义理解 large language model multimodal
2 Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model 提出多模态学习方法以增强人脸变形攻击检测 foundation model multimodal
3 ViMoNet: A Multimodal Vision-Language Framework for Human Behavior Understanding from Motion and Video 提出ViMoNet以解决人类行为理解中的多模态数据融合问题 large language model multimodal
4 IAG: Input-aware Backdoor Attack on VLM-based Visual Grounding 提出IAG以解决VLM基础视觉定位系统的后门攻击问题 multimodal visual grounding
5 MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning 提出MANGO方法以解决多模态融合学习的特征捕捉问题 multimodal
6 January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis 提出January Food Benchmark以解决营养分析标准化问题 multimodal
7 Multimodal Sheaf-based Network for Glioblastoma Molecular Subtype Prediction 提出基于sheaf的多模态网络以解决胶质母细胞瘤分子亚型预测问题 multimodal
8 NEURAL: Attention-Guided Pruning for Unified Multimodal Resource-Constrained Clinical Evaluation 提出NEURAL以解决资源受限临床环境中的多模态医学影像数据压缩问题 multimodal
9 The Brain Resection Multimodal Image Registration (ReMIND2Reg) 2025 Challenge 提出ReMIND2Reg挑战以解决脑肿瘤手术中的图像配准问题 multimodal
10 CellSymphony: Deciphering the molecular and phenotypic orchestration of cells with single-cell pathomics 提出CellSymphony以解决细胞特征提取与空间转录组数据整合问题 foundation model multimodal
11 Echo-4o: Harnessing the Power of GPT-4o Synthetic Images for Improved Image Generation 提出Echo-4o以解决图像生成中的数据稀缺问题 foundation model multimodal
12 DINOv3 提出DINOv3以解决自监督学习中的特征图退化问题 foundation model
13 iWatchRoad: Scalable Detection and Geospatial Visualization of Potholes for Smart Cities 提出iWatchRoad以解决印度道路坑洼检测问题 TAMP
14 On the dynamic evolution of CLIP texture-shape bias and its relationship to human alignment and model robustness 分析CLIP模型训练过程中的纹理-形状偏差及其与人类感知的关系 multimodal
15 Preacher: Paper-to-Video Agentic System 提出Preacher以解决论文转视频生成的多重限制问题 chain-of-thought
16 Learning Spatial Decay for Vision Transformers 提出空间衰减变换器以提升视觉变换器的空间注意力 large language model
17 Gen-AFFECT: Generation of Avatar Fine-grained Facial Expressions with Consistent identiTy 提出GEN-AFFECT以解决个性化头像生成中的表情一致性问题 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (12 篇)

#题目一句话要点标签🔗
18 A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation 综述3D高斯点云技术在分割、编辑与生成中的应用 3D gaussian splatting 3DGS gaussian splatting
19 GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors 提出GSFixer以解决3D高斯点云重建中的伪影问题 3D gaussian splatting 3DGS gaussian splatting
20 CitySeg: A 3D Open Vocabulary Semantic Segmentation Foundation Model in City-scale Scenarios 提出CitySeg以解决城市规模点云语义分割问题 open-vocabulary open vocabulary foundation model
21 EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting 提出EntropyGS以高效编码3D Gaussian Splatting数据 3D gaussian splatting 3DGS gaussian splatting
22 Surg-InvNeRF: Invertible NeRF for 3D tracking and reconstruction in surgical vision 提出Invertible NeRF以解决外科视觉中的3D跟踪与重建问题 NeRF neural radiance field
23 HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics 提出HumanGenesis以解决合成人体动态中的几何不一致性和运动泛化问题 3D gaussian splatting gaussian splatting splatting
24 Distilling LLM Prior to Flow Model for Generalizable Agent's Imagination in Object Goal Navigation 提出GOAL框架以解决室内目标导航中的不确定性问题 semantic map large language model
25 Semantic-aware DropSplat: Adaptive Pruning of Redundant Gaussians for 3D Aerial-View Segmentation 提出SAD-Splat以解决3D航空图像语义分割中的模糊性问题 scene understanding foundation model
26 PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image 提出PERSONA框架以从单张图像生成个性化3D人类头像 3DGS NeRF
27 RayletDF: Raylet Distance Fields for Generalizable 3D Surface Reconstruction from Point Clouds or Gaussians 提出RayletDF以解决3D表面重建问题 3DGS
28 E-4DGS: High-Fidelity Dynamic Reconstruction from the Multi-view Event Cameras 提出E-4DGS以解决动态场景重建中的光照与模糊问题 scene reconstruction
29 SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing 提出SVG-Head以解决高保真头部重建与实时编辑问题 implicit representation

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
30 SkySplat: Generalizable 3D Gaussian Splatting from Multi-Temporal Sparse Satellite Images 提出SkySplat以解决多时相稀疏卫星图像的3D重建问题 MAE 3D gaussian splatting 3DGS
31 HyperKD: Distilling Cross-Spectral Knowledge in Masked Autoencoders via Inverse Domain Shift with Spatial-Aware Masking and Specialized Loss 提出HyperKD以解决高光谱遥感中的知识蒸馏问题 representation learning masked autoencoder MAE
32 Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory 提出M3-Agent以解决多模态智能体的长期记忆问题 reinforcement learning multimodal
33 WeatherPrompt: Multi-modality Representation Learning for All-Weather Drone Visual Geo-Localization 提出WeatherPrompt以解决无人机视觉地理定位中的天气干扰问题 representation learning contrastive learning
34 BridgeTA: Bridging the Representation Gap in Knowledge Distillation via Teacher Assistant for Bird's Eye View Map Segmentation 提出BridgeTA以解决知识蒸馏中的表示差距问题 teacher-student distillation
35 SpeechForensics: Audio-Visual Speech Representation Learning for Face Forgery Detection 提出音视频联合学习方法以解决人脸伪造检测问题 representation learning

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
36 Physical Autoregressive Model for Robotic Manipulation without Action Pretraining 提出物理自回归模型以解决机器人操作数据稀缺问题 manipulation
37 RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization 提出RelayFormer以解决图像和视频篡改区域定位问题 manipulation
38 LIA-X: Interpretable Latent Portrait Animator 提出LIA-X以解决可解释性和控制性不足的问题 manipulation

🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)

#题目一句话要点标签🔗
39 OneVAE: Joint Discrete and Continuous Optimization Helps Discrete Video VAE Train Better 提出OneVAE以解决离散视频VAE训练不稳定问题 spatiotemporal
40 Noise-adapted Neural Operator for Robust Non-Line-of-Sight Imaging 提出噪声适应神经算子以解决非视线成像问题 spatiotemporal
41 Animate-X++: Universal Character Image Animation with Dynamic Backgrounds 提出Animate-X++以解决角色动画与动态背景问题 character animation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
42 GoViG: Goal-Conditioned Visual Navigation Instruction Generation 提出GoViG以解决基于视觉的导航指令生成问题 egocentric large language model multimodal
43 Enhancing Monocular 3D Hand Reconstruction with Learned Texture Priors 提出轻量级纹理模块以提升单目3D手重建精度 hand reconstruction

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
44 Episodic Memory Representation for Long-form Video Understanding 提出Video-EM以解决长视频理解中的上下文限制问题 spatial relationship large language model chain-of-thought

⬅️ 返回 cs.CV 首页 · 🏠 返回主页