cs.CV(2025-07-27)

📊 共 20 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗5) 支柱三:空间感知与语义 (Perception & Semantics) (5 🔗1) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱二:RL算法与架构 (RL & Architecture) (1 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning 提出基于大语言模型的球员中心多模态提示生成网络,用于身份感知篮球视频描述 large language model multimodal
2 When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios 首个多模态长上下文Token压缩综述,涵盖图像、视频与音频 large language model multimodal
3 Can Foundation Models Predict Fitness for Duty? 利用虹膜图像和预训练模型预测人员是否适合工作 foundation model
4 ModalFormer: Multimodal Transformer for Low-Light Image Enhancement 提出ModalFormer以解决低光照图像增强问题 multimodal
5 L-MCAT: Unpaired Multimodal Transformer with Contrastive Attention for Label-Efficient Satellite Image Classification L-MCAT:面向弱监督卫星图像分类的对比注意力多模态Transformer multimodal
6 MIRepNet: A Pipeline and Foundation Model for EEG-Based Motor Imagery Classification MIRepNet:面向脑电运动想象分类的专用预训练模型与流程 foundation model
7 Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models 提出MECo框架,利用大语言模型实现运动示例控制的伴随语音手势生成。 large language model
8 Trust the Model: Compact VLMs as In-Context Judges for Image-Text Data Quality 提出一种基于小型VLM的图像-文本数据质量过滤框架,提升训练数据质量。 large language model multimodal
9 SAMwave: Wavelet-Driven Feature Enrichment for Effective Adaptation of Segment Anything Model SAMwave:利用小波变换增强特征,有效提升SAM模型在复杂任务上的适应性 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
10 Decomposing Densification in Gaussian Splatting for Faster 3D Scene Reconstruction 提出全局到局部高斯增长策略,加速3D场景重建并提升渲染质量。 3D gaussian splatting gaussian splatting splatting
11 NeuroVoxel-LM: Language-Aligned 3D Perception via Dynamic Voxelization and Meta-Embedding 提出NeuroVoxel-LM以解决稀疏点云特征提取效率低下问题 NeRF neural radiance field large language model
12 VESPA: Towards un(Human)supervised Open-World Pointcloud Labeling for Autonomous Driving VESPA:面向自动驾驶的无监督开放世界点云标注方法 scene understanding open-vocabulary open vocabulary
13 From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos 提出基于3DGS与扩散模型的混合框架,实现视频中逼真的3D手镯插入 3D gaussian splatting 3DGS gaussian splatting
14 Solving Scene Understanding for Autonomous Navigation in Unstructured Environments 针对非结构化环境,提出基于深度学习的场景理解方法用于自动驾驶导航 scene understanding

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
15 Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach 提出多模态语义推理框架以检测增强现实中的视觉信息操控攻击 manipulation multimodal
16 LRR-Bench: Left, Right or Rotate? Vision-Language models Still Struggle With Spatial Understanding Tasks LRR-Bench:揭示视觉-语言模型在空间理解任务中的不足 humanoid humanoid robot manipulation
17 RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters RESCUE:通过控制SDM联合角色实现人群疏散仿真 gait control

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
18 MagicAnime: A Hierarchically Annotated, Multimodal and Multitasking Dataset with Benchmarks for Cartoon Animation Generation MagicAnime:一个用于卡通动画生成的分层标注多模态多任务数据集及基准 character animation multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
19 MambaMap: Online Vectorized HD Map Construction using State Space Model MambaMap:利用状态空间模型在线构建矢量化高精地图 Mamba state space model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
20 PUMPS: Skeleton-Agnostic Point-based Universal Motion Pre-Training for Synthesis in Human Motion Tasks PUMPS:用于人体运动合成的、与骨骼无关的、基于点的通用运动预训练模型 motion synthesis character animation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页