cs.CV(2025-12-06)

📊 共 18 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗1) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models 提出LaVer框架,通过潜在视觉重建增强多模态大语言模型的视觉表征能力 large language model multimodal
2 OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation OmniSafeBench-MM:多模态大模型越狱攻防的统一评测基准与工具箱 large language model multimodal
3 Sanvaad: A Multimodal Accessibility Framework for ISL Recognition and Voice-Based Interaction Sanvaad:一个用于ISL识别和语音交互的多模态可访问性框架 multimodal
4 Beyond Hallucinations: A Multimodal-Guided Task-Aware Generative Image Compression for Ultra-Low Bitrate 提出多模态引导的任务感知生成图像压缩框架,解决超低码率下的语义偏差问题。 multimodal
5 Towards Stable Cross-Domain Depression Recognition under Missing Modalities 提出SCD-MLLM框架,解决跨域抑郁症识别中模态缺失时的稳定性问题。 large language model multimodal
6 RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension 提出RefBench-PRO基准,用于评估多模态大模型在指代表达理解中的感知和推理能力。 large language model
7 Language-driven Fine-grained Retrieval 提出LaFG框架,利用语言模型增强细粒度图像检索的跨类别泛化能力。 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
8 TriaGS: Differentiable Triangulation-Guided Geometric Consistency for 3D Gaussian Splatting TriaGS:通过可微三角测量引导几何一致性的3D高斯溅射 3D gaussian splatting gaussian splatting splatting
9 AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars AGORA:提出基于对抗生成网络的实时可控3D高斯头部头像 3D gaussian splatting 3DGS gaussian splatting
10 GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose Estimation GNC-Pose:结合几何感知的GNC-PnP方法,实现精确的6D位姿估计 6D pose estimation feature matching
11 HuPrior3R: Incorporating Human Priors for Better 3D Dynamic Reconstruction from Monocular Videos 提出HuPrior3R,融合人体先验知识,提升单目视频三维动态重建效果 depth estimation monocular depth SMPL

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
12 VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning 提出VG-Refiner,通过Agent强化学习优化工具反馈,提升指代 grounding 推理能力 reinforcement learning multimodal
13 ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models ReCAD:利用强化学习增强的参数化CAD模型生成,基于视觉-语言模型 reinforcement learning multimodal
14 S2WMamba: A Spectral-Spatial Wavelet Mamba for Pansharpening 提出S2WMamba,通过谱-空域小波变换和Mamba模块实现高效遥感图像融合 Mamba

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
15 Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation 提出基于时空特性的事件相机人体姿态估计方法,提升效率与精度 human motion spatiotemporal
16 VAD-Net: Multidimensional Facial Expression Recognition in Intelligent Education System VAD-Net:在智能教育系统中进行多维度面部表情识别,提出VAD标注并引入正交卷积。 motion prediction

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
17 DragMesh: Interactive 3D Generation Made Easy DragMesh:提出解耦运动生成框架,实现实时交互式3D物体可动性生成 motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
18 Opinion: Learning Intuitive Physics May Require More than Visual Data 研究表明,仅凭大量视觉数据或类儿童视角数据难以使模型掌握直观物理 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页