cs.CV(2025-12-21)
📊 共 18 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3)
支柱三:空间感知与语义 (Perception & Semantics) (4)
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
支柱八:物理动画 (Physics-based Animation) (1 🔗1)
支柱一:机器人控制 (Robot Control) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | EcoSplat: Efficiency-controllable Feed-forward 3D Gaussian Splatting from Multi-view Images | EcoSplat:一种效率可控的单次前向3D高斯溅射重建方法 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 9 | Geometric-Photometric Event-based 3D Gaussian Ray Tracing | 提出基于事件的几何-光度3D高斯光线追踪,提升事件相机3D重建精度和效率 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 10 | A Study of Finetuning Video Transformers for Multi-view Geometry Tasks | 通过微调视频Transformer,解决多视角几何任务,达到SOTA水平。 | depth estimation optical flow foundation model | ||
| 11 | SplatBright: Generalizable Low-Light Scene Reconstruction from Sparse Views via Physically-Guided Gaussian Enhancement | SplatBright:基于物理引导的高斯增强实现稀疏视角下低光场景的通用重建 | scene reconstruction |
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 12 | InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search | InSight-o3:通过广义视觉搜索增强多模态基础模型 | reinforcement learning foundation model multimodal | ✅ | |
| 13 | brat: Aligned Multi-View Embeddings for Brain MRI Analysis | 提出brat:一种用于脑部MRI分析的对齐多视图嵌入框架 | representation learning feature matching foundation model | ||
| 14 | Enhancing Medical Large Vision-Language Models via Alignment Distillation | 提出MEDALIGN框架,通过对齐蒸馏提升医学大视觉语言模型的视觉理解能力 | representation learning distillation | ||
| 15 | Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts | 提出UniRect统一框架,利用Mamba模型解决图像校正与矩形化问题 | Mamba |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | EchoMotion: Unified Human Video and Motion Generation via Dual-Modality Diffusion Transformer | EchoMotion:通过双模态扩散Transformer实现统一的人体视频和动作生成 | motion generation human motion |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 17 | CrashChat: A Multimodal Large Language Model for Multitask Traffic Crash Video Analysis | 提出CrashChat,用于多任务交通碰撞视频分析的多模态大语言模型 | spatiotemporal large language model multimodal | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | VizDefender: Unmasking Visualization Tampering through Proactive Localization and Intent Inference | VizDefender:通过主动定位和意图推断揭示可视化篡改 | manipulation large language model multimodal |