cs.CV（2025-09-27）

📊 共 29 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (11 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (7 🔗2) 支柱七：动作重定向 (Motion Retargeting) (4) 支柱三：空间感知与语义 (Perception & Semantics) (3) 支柱四：生成式动作 (Generative Motion) (3) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness	提出C$^3$B：一个基于漫画的多模态文化感知能力评测基准	large language model multimodal
2	Planning with Unified Multimodal Models	提出Uni-Plan，利用统一多模态模型进行长程规划，提升决策能力。	large language model multimodal
3	DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice	DentVLM：用于全面牙科诊断和增强临床实践的多模态视觉-语言模型	multimodal
4	Decoupling Reasoning and Perception: An LLM-LMM Framework for Faithful Visual Reasoning	提出解耦推理与感知的LLM-LMM框架，提升视觉推理的可靠性	large language model multimodal chain-of-thought
5	Learning Regional Monsoon Patterns with a Multimodal Attention U-Net	提出基于多模态注意力U-Net的区域季风模式学习框架，提升印度降雨预测精度。	multimodal
6	TATTOO: Training-free AesTheTic-aware Outfit recOmmendation	提出TATTOO：一种无需训练且具有美学感知能力的服装搭配推荐方法	large language model multimodal chain-of-thought
7	GRAPE: Let GPRO Supervise Query Rewriting by Ranking for Retrieval	GRAPE：利用排序监督Query重写，提升检索系统在分布偏移下的性能	large language model multimodal	✅
8	Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning	提出火主题文化图像诊断框架，揭示视觉-语言模型在文化理解上的偏差	multimodal
9	SynDoc: A Hybrid Discriminative-Generative Framework for Enhancing Synthetic Domain-Adaptive Document Key Information Extraction	SynDoc：一种混合判别-生成框架，用于增强合成领域自适应文档关键信息提取。	multimodal
10	Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection	提出基于自反思的自洽性方法，减少视觉-语言模型中的幻觉问题	instruction following
11	Uncovering Intrinsic Capabilities: A Paradigm for Data Curation in Vision-Language Models	提出能力归因数据精选框架CADC，提升视觉-语言模型指令调优效率。	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Streamline pathology foundation model by cross-magnification distillation	提出XMAG，通过跨倍率蒸馏构建轻量级病理学基础模型，加速临床部署。	distillation foundation model
13	Balanced Diffusion-Guided Fusion for Multimodal Remote Sensing Classification	提出平衡扩散引导融合框架，解决多模态遥感分类中的模态不平衡问题。	Mamba multimodal	✅
14	Reinforcement Learning-Based Prompt Template Stealing for Text-to-Image Models	提出RLStealer，基于强化学习从少量图像中窃取文本到图像模型的Prompt模板。	reinforcement learning large language model multimodal
15	RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation	RestoRect：基于潜在空间修正流和特征蒸馏的图像复原方法	distillation feature matching
16	CasPoinTr: Point Cloud Completion with Cascaded Networks and Knowledge Distillation	CasPoinTr：基于级联网络和知识蒸馏的点云补全框架	distillation
17	Enhancing Blind Face Restoration through Online Reinforcement Learning	提出基于在线强化学习的似然正则化策略优化框架，提升盲人脸修复效果	reinforcement learning
18	C3-OWD: A Curriculum Cross-modal Contrastive Learning Framework for Open-World Detection	提出C3-OWD框架以解决开放世界检测中的鲁棒性与多样性问题	contrastive learning	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
19	GeLoc3r: Enhancing Relative Camera Pose Regression with Geometric Consistency Regularization	GeLoc3r：通过几何一致性正则化增强相对相机位姿回归	geometric consistency
20	Sparse2Dense: A Keypoint-driven Generative Framework for Human Video Compression and Vertex Prediction	Sparse2Dense：一种关键点驱动的生成框架，用于人体视频压缩和顶点预测	geometric consistency human motion
21	Disentangling Static and Dynamic Information for Reducing Static Bias in Action Recognition	提出解耦静态与动态信息的方法，以减少动作识别中的静态偏见	human motion
22	CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP	CoPatch：利用CLIP中未开发的 spatial knowledge 实现零样本指代图像分割	spatial relationship

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
23	OracleGS: Grounding Generative Priors for Sparse-View Gaussian Splatting	OracleGS：利用生成先验进行稀疏视角高斯溅射，提升新视角合成质量。	3D gaussian splatting gaussian splatting splatting
24	Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos	提出基于方向锚定的超高斯方法OriGS，用于从单目视频进行高质量4D重建。	3D gaussian splatting gaussian splatting splatting
25	FM-SIREN & FM-FINER: Nyquist-Informed Frequency Multiplier for Implicit Neural Representation with Periodic Activation	FM-SIREN/FINER：通过Nyquist频率乘子提升周期激活隐式神经表示性能	NeRF neural radiance field

🔬 支柱四：生成式动作 (Generative Motion) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
26	Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing	Vid-Freeze：通过时序冻结保护图像免受恶意图像到视频生成攻击	motion synthesis
27	3DPCNet: Pose Canonicalization for Robust Viewpoint-Invariant 3D Kinematic Analysis from Monocular RGB cameras	提出3DPCNet以解决单目RGB相机下的3D姿态标准化问题	physically plausible
28	Generative Modeling of Shape-Dependent Self-Contact Human Poses	提出基于形状感知的自接触人体姿态生成模型，提升单视角姿态估计精度	penetration

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
29	Evaluating point-light biological motion in multimodal large language models	ActPLD基准测试揭示多模态大语言模型在理解点光生物运动方面的不足	spatiotemporal large language model multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页