cs.CV(2025-09-27)

📊 共 29 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (11 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗2) 支柱七:动作重定向 (Motion Retargeting) (4) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱四:生成式动作 (Generative Motion) (3) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
1 Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness 提出C$^3$B:一个基于漫画的多模态文化感知能力评测基准 large language model multimodal
2 Planning with Unified Multimodal Models 提出Uni-Plan,利用统一多模态模型进行长程规划,提升决策能力。 large language model multimodal
3 DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice DentVLM:用于全面牙科诊断和增强临床实践的多模态视觉-语言模型 multimodal
4 Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning 提出解耦推理与感知的LLM-LMM框架,提升视觉推理的可靠性 large language model multimodal chain-of-thought
5 Learning Regional Monsoon Patterns with a Multimodal Attention U-Net 提出基于多模态注意力U-Net的区域季风模式学习框架,提升印度降雨预测精度。 multimodal
6 TATTOO: Training-free AesTheTic-aware Outfit recOmmendation 提出TATTOO:一种无需训练且具有美学感知能力的服装搭配推荐方法 large language model multimodal chain-of-thought
7 GRAPE: Let GPRO Supervise Query Rewriting by Ranking for Retrieval GRAPE:利用排序监督Query重写,提升检索系统在分布偏移下的性能 large language model multimodal
8 Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning 提出火主题文化图像诊断框架,揭示视觉-语言模型在文化理解上的偏差 multimodal
9 SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction SynDoc:一种混合判别-生成框架,用于增强合成领域自适应文档关键信息提取。 multimodal
10 Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection 提出基于自反思的自洽性方法,减少视觉-语言模型中的幻觉问题 instruction following
11 Uncovering Intrinsic Capabilities: A Paradigm for Data Curation in Vision-Language Models 提出能力归因数据精选框架CADC,提升视觉-语言模型指令调优效率。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
12 Streamline pathology foundation model by cross-magnification distillation 提出XMAG,通过跨倍率蒸馏构建轻量级病理学基础模型,加速临床部署。 distillation foundation model
13 Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification 提出平衡扩散引导融合框架,解决多模态遥感分类中的模态不平衡问题。 Mamba multimodal
14 Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models 提出RLStealer,基于强化学习从少量图像中窃取文本到图像模型的Prompt模板。 reinforcement learning large language model multimodal
15 RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation RestoRect:基于潜在空间修正流和特征蒸馏的图像复原方法 distillation feature matching
16 CasPoinTr: Point Cloud Completion with Cascaded Networks and Knowledge Distillation CasPoinTr:基于级联网络和知识蒸馏的点云补全框架 distillation
17 Enhancing Blind Face Restoration through Online Reinforcement Learning 提出基于在线强化学习的似然正则化策略优化框架,提升盲人脸修复效果 reinforcement learning
18 C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection 提出C3-OWD框架以解决开放世界检测中的鲁棒性与多样性问题 contrastive learning

🔬 支柱七:动作重定向 (Motion Retargeting) (4 篇)

#题目一句话要点标签🔗
19 GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization GeLoc3r:通过几何一致性正则化增强相对相机位姿回归 geometric consistency
20 Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction Sparse2Dense:一种关键点驱动的生成框架,用于人体视频压缩和顶点预测 geometric consistency human motion
21 Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition 提出解耦静态与动态信息的方法,以减少动作识别中的静态偏见 human motion
22 CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP CoPatch:利用CLIP中未开发的 spatial knowledge 实现零样本指代图像分割 spatial relationship

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
23 OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting OracleGS:利用生成先验进行稀疏视角高斯溅射,提升新视角合成质量。 3D gaussian splatting gaussian splatting splatting
24 Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos 提出基于方向锚定的超高斯方法OriGS,用于从单目视频进行高质量4D重建。 3D gaussian splatting gaussian splatting splatting
25 FM-SIREN & FM-FINER: Nyquist-Informed Frequency Multiplier for Implicit Neural Representation with Periodic Activation FM-SIREN/FINER:通过Nyquist频率乘子提升周期激活隐式神经表示性能 NeRF neural radiance field

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
26 Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing Vid-Freeze:通过时序冻结保护图像免受恶意图像到视频生成攻击 motion synthesis
27 3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras 提出3DPCNet以解决单目RGB相机下的3D姿态标准化问题 physically plausible
28 Generative Modeling of Shape-Dependent Self-Contact Human Poses 提出基于形状感知的自接触人体姿态生成模型,提升单视角姿态估计精度 penetration

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
29 Evaluating point-light biological motion in multimodal large language models ActPLD基准测试揭示多模态大语言模型在理解点光生物运动方面的不足 spatiotemporal large language model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页