cs.CV(2025-01-24)

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (5) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗2) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 Trick-GS: A Balanced Bag of Tricks for Efficient Gaussian Splatting Trick-GS:面向资源受限设备的高效高斯溅射方法 gaussian splatting splatting
2 Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images 提出基于小波变换高斯溅射的微宏观方法,用于从无约束图像中进行3D重建。 gaussian splatting splatting
3 Scene Understanding Enabled Semantic Communication with Open Channel Coding 提出OpenSC:结合场景理解、LLM与开放信道编码的语义通信系统 scene understanding large language model
4 Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video 提出CorrGS,通过噪声视频进行鲁棒的自运动估计和3D重建。 gaussian splatting splatting
5 Dense-SfM: Structure from Motion with Dense Consistent Matching Dense-SfM:结合稠密匹配与高斯溅射的精确三维重建框架 gaussian splatting splatting
6 SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation SyncAnimation:首个基于NeRF的实时端到端音频驱动人脸和全身动画框架 NeRF
7 Rethinking Encoder-Decoder Flow Through Shared Structures 提出共享结构“banks”增强解码器,提升Transformer深度估计性能 depth estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
8 Leveraging ChatGPT's Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research 利用ChatGPT多模态视觉能力,通过卫星图像评估贫困程度,推进社会科学研究工具。 large language model multimodal
9 Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection 提出MUSE模型,通过三路径增强神经架构搜索解决多模态假新闻检测问题。 multimodal
10 Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing 提出融合地图信息的遥感图像-文本数据集生成方法,缓解幻觉问题。 large language model multimodal
11 Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models 提出GSWA模块,为高分辨率LVLM中的子图像动态分配语义权重,提升视觉理解能力。 multimodal
12 Dynamic Token Reduction during Generation for Vision Language Models 提出动态速率(DyRate)方法,解决视觉语言模型生成过程中视觉token冗余问题。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
13 HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation HERMES:用于同步3D场景理解与生成的统一自动驾驶世界模型 world model scene understanding large language model
14 Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation 提出基于Jaccard距离条件对比学习和上下文视觉增强的多模态实体链接方法 contrastive learning multimodal
15 Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation 提出Surface Vision Mamba,用于高效球面流形表示和神经发育表型回归。 Mamba state space model
16 Dreamweaver: Learning Compositional World Models from Pixels Dreamweaver:提出一种从像素学习组合世界模型的方法,用于视频分解和未来预测。 world model

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
17 ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations 提出ReferDINO以解决视频目标分割中的视觉引导问题 spatiotemporal visual grounding

⬅️ 返回 cs.CV 首页 · 🏠 返回主页