cs.CV(2025-01-03)

📊 共 24 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗3) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models 提出I-FAS:利用多模态大语言模型提升人脸反欺骗的泛化能力与可解释性 large language model multimodal
2 Multimodal classification of forest biodiversity potential from 2D orthophotos and 3D airborne laser scanning point clouds 提出基于深度学习的多模态融合方法,利用正射影像和激光雷达数据评估森林生物多样性潜力。 multimodal
3 Google is all you need: Semi-Supervised Transfer Learning Strategy For Light Multimodal Multi-Task Classification Model 提出一种半监督迁移学习策略,用于轻量级多模态多任务分类模型,提升图像标签精度。 multimodal
4 VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction VITA-1.5:面向GPT-4o水平的实时视觉与语音交互多模态大模型 large language model multimodal
5 Virgo: A Preliminary Exploration on Reproducing o1-like MLLM Virgo:通过文本长程思维数据微调MLLM,探索多模态慢思考推理能力 large language model multimodal
6 HLV-1K: A Large-scale Hour-Long Video Benchmark for Time-Specific Long Video Understanding 构建大规模小时级视频基准HLV-1K,促进时间感知长视频理解研究。 large language model multimodal
7 AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs AVTrustBench:评估并提升音视频大语言模型的可靠性和鲁棒性 large language model
8 MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation 提出MoEE模型和DH-FaceEmoVid-150数据集,用于生成具有复杂情感的音频驱动人像动画。 multimodal
9 LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction LogicAD:基于VLM文本特征提取的可解释异常检测 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
10 CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction 提出CrossView-GS,解决大规模场景跨视角重建中3DGS优化难题。 3D gaussian splatting 3DGS gaussian splatting
11 PG-SAG: Parallel Gaussian Splatting for Fine-Grained Large-Scale Urban Buildings Reconstruction via Semantic-Aware Grouping 提出PG-SAG,通过语义感知分组并行高斯溅射重建大规模城市建筑 3D gaussian splatting 3DGS gaussian splatting
12 DreamMask: Boosting Open-vocabulary Panoptic Segmentation with Synthetic Data DreamMask:利用合成数据提升开放词汇全景分割性能 open-vocabulary open vocabulary
13 Cloth-Splatting: 3D Cloth State Estimation from RGB Supervision Cloth-Splatting:利用RGB监督进行3D布料状态估计 3D gaussian splatting gaussian splatting splatting
14 SafeAug: Safety-Critical Driving Data Augmentation from Naturalistic Datasets SafeAug:从自然数据集增强安全关键的自动驾驶数据 depth estimation
15 VideoLifter: Lifting Videos to 3D with Fast Hierarchical Stereo Alignment VideoLifter:利用快速分层立体对齐将视频提升为3D模型 scene understanding
16 D$^3$-Human: Dynamic Disentangled Digital Human from Monocular Video D$^3$-Human:提出解耦的动态数字人重建方法,解决单目视频中服装遮挡问题 implicit representation

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
17 A Separable Self-attention Inspired by the State Space Model for Computer Vision 受状态空间模型启发,提出可分离自注意力机制,用于计算机视觉任务。 Mamba SSM state space model
18 MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders 提出MoVE-KD,通过知识蒸馏将多个视觉编码器的能力迁移到单个高效VLM中。 distillation foundation model
19 3D Cloud reconstruction through geospatially-aware Masked Autoencoders 提出基于地理空间感知的掩码自编码器,用于三维云重构 masked autoencoder MAE
20 Merging Context Clustering with Visual State Space Models for Medical Image Segmentation 提出CCViM,融合上下文聚类与视觉状态空间模型,提升医学图像分割性能。 Mamba state space model

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
21 Aesthetic Matters in Music Perception for Image Stylization: A Emotion-driven Music-to-Visual Manipulation EmoMV:提出情感驱动的音乐到视觉图像风格化方法 manipulation multimodal
22 IAM: Enhancing RGB-D Instance Segmentation with New Benchmarks 提出RGB-D实例分割新基准IAM,并提出有效的数据融合方法,提升场景理解能力 manipulation scene understanding

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
23 JoyGen: Audio-Driven 3D Depth-Aware Talking-Face Video Editing JoyGen:提出深度感知的音频驱动3D说话人脸视频编辑框架 motion generation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
24 VidFormer: A novel end-to-end framework fused by 3DCNN and Transformer for Video-based Remote Physiological Measurement 提出VidFormer框架以解决视频基础远程生理信号测量问题 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页