cs.CV(2025-01-21)

📊 共 25 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗2) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 HAC++: Towards 100X Compression of 3D Gaussian Splatting HAC++:实现3D高斯溅射100倍压缩,提升渲染保真度 3D gaussian splatting 3DGS gaussian splatting
2 Survey on Monocular Metric Depth Estimation 提出单目度量深度估计以解决深度预测一致性问题 visual SLAM depth estimation monocular depth
3 Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Video Depth Anything:为超长视频提供一致性深度估计 depth estimation monocular depth Depth Anything
4 Towards Affordance-Aware Articulation Synthesis for Rigged Objects 提出A3Syn,解决开放域绑定物体的具身姿态自动合成问题 affordance affordance-aware
5 GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting 提出GSVC以通过2D高斯点云高效表示和压缩视频 gaussian splatting splatting
6 Fast Underwater Scene Reconstruction using Multi-View Stereo and Physical Imaging 提出基于物理成像的水下多视图立体快速重建方法,提升重建质量与效率。 depth estimation NeRF neural radiance field
7 DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions 提出基于衰减各向异性径向基函数(DARB)的Splatting方法,加速训练并降低内存消耗。 3D gaussian splatting gaussian splatting splatting
8 Learning segmentation from point trajectories 利用点轨迹学习视频分割,无需额外监督信息。 optical flow
9 Continuous 3D Perception Model with Persistent State 提出CUT3R,利用持续状态的循环模型解决连续3D感知任务。 scene reconstruction

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
10 VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model VARGPT:视觉自回归多模态大语言模型,统一理解与生成任务 large language model multimodal instruction following
11 EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents EmbodiedEval:提出一个综合性的交互式基准,用于评估多模态LLM在具身智能任务中的表现。 embodied AI large language model multimodal
12 InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models InsTALL:利用多模态大语言模型实现上下文感知的任务指导助手 large language model multimodal
13 Explainability for Vision Foundation Models: A Survey 综述:视觉基础模型的可解释性研究进展与挑战 foundation model
14 ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions ComposeAnyone:提出解耦多模态条件的可控布局到人体生成方法。 multimodal
15 Are Traditional Deep Learning Model Approaches as Effective as a Retinal-Specific Foundation Model for Ocular and Systemic Disease Detection? 评估视网膜专用基础模型RETFound与传统深度学习模型在眼科和全身疾病检测中的有效性 foundation model
16 MMVU: Measuring Expert-Level Multi-Discipline Video Understanding MMVU:提出专家级多学科视频理解评测基准,挑战通用模型在专业领域的知识推理能力。 foundation model multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
17 High-dimensional multimodal uncertainty estimation by manifold alignment:Application to 3D right ventricular strain computations 提出基于流形对齐的高维多模态不确定性估计方法,应用于三维右心室应变计算。 representation learning multimodal
18 InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling InternVideo2.5通过长程和丰富上下文建模增强视频多模态大语言模型 direct preference optimization spatiotemporal large language model
19 Contrastive Masked Autoencoders for Character-Level Open-Set Writer Identification 提出CMAE模型,解决字符级开放集手写者身份识别问题 representation learning masked autoencoder MAE
20 Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos 提出Memory Storyboard,利用时序分割进行第一视角视频流的自监督学习 contrastive learning egocentric
21 DNRSelect: Active Best View Selection for Deferred Neural Rendering DNRSelect:用于延迟神经渲染的主动最佳视角选择方法 reinforcement learning NeRF geometric consistency
22 SMamba: Sparse Mamba for Event-based Object Detection 提出SMamba:一种稀疏Mamba架构,用于提升事件相机目标检测的效率与精度。 Mamba spatiotemporal
23 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model 提出InternLM-XComposer2.5-Reward,一个简单高效的多模态奖励模型,用于提升LVLM的生成质量。 reinforcement learning PPO instruction following

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
24 Cinepro: Robust Training of Foundation Models for Cancer Detection in Prostate Ultrasound Cineloops Cinepro:通过稳健训练提升前列腺超声电影环中癌症检测的基础模型性能 spatial relationship foundation model

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
25 Regressor-Guided Generative Image Editing Balances User Emotions to Reduce Time Spent Online 提出Regressor引导的生成图像编辑,平衡用户情绪以减少上网时间 classifier-free guidance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页