cs.CV(2025-02-04)

📊 共 32 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (13 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (3) 支柱四:生成式动作 (Generative Motion) (3 🔗1) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
1 Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation 提出Mosaic3D数据集与模型,用于开放词汇3D场景分割 contrastive learning scene understanding open-vocabulary
2 LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation LayoutDreamer:提出物理引导的布局方法,用于文本到3D组合场景生成。 dreamer 3D gaussian splatting 3DGS
3 3D Foundation Model for Generalizable Disease Detection in Head Computed Tomography 提出FM-CT:用于头部CT图像疾病检测的3D基础模型 distillation foundation model
4 Particle Trajectory Representation Learning with Masked Point Modeling 提出PoLAr-MAE,利用掩码点建模实现LArTPC图像的自监督粒子轨迹表示学习。 representation learning masked autoencoder MAE
5 AAD-DCE: An Aggregated Multimodal Attention Mechanism for Early and Late Dynamic Contrast Enhanced Prostate MRI Synthesis 提出AAD-DCE,利用多模态注意力机制合成早期和晚期动态增强前列腺MRI图像。 MAE multimodal
6 Mind the Gap: Evaluating Patch Embeddings from General-Purpose and Histopathology Foundation Models for Cell Segmentation and Classification 对比通用与病理学预训练模型,评估细胞分割与分类中的Patch Embedding性能差距 representation learning foundation model
7 MaintaAvatar: A Maintainable Avatar Based on Neural Radiance Fields by Continual Learning 提出MaintaAvatar,通过持续学习维护NeRF化身,解决外观和姿态变化下的灾难性遗忘问题。 distillation NeRF neural radiance field
8 MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm MotionLab:通过运动-条件-运动范式统一生成和编辑人体运动 curriculum learning motion generation
9 IPO: Iterative Preference Optimization for Text-to-Video Generation 提出迭代偏好优化(IPO)方法,提升文本到视频生成模型的视频质量。 direct preference optimization large language model foundation model
10 One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation 提出FluxSR,通过流轨迹蒸馏实现单步真实世界图像超分辨率重建 flow matching distillation
11 DAMA: Data- and Model-aware Alignment of Multi-modal LLMs DAMA:数据与模型感知的多模态LLM对齐方法,提升模型可信度与效果 DPO direct preference optimization large language model
12 Controllable Video Generation with Provable Disentanglement 提出CoVoGAN,通过可证明的解耦实现可控视频生成 latent dynamics spatiotemporal
13 UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation UNIP:重新思考红外语义分割的预训练注意力模式 MAE distillation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
14 LV-XAttn: Distributed Cross-Attention for Long Visual Inputs in Multimodal Large Language Models 提出LV-XAttn以解决大规模视觉输入的跨注意力计算瓶颈问题 large language model multimodal
15 Looking Locally: Object-Centric Vision Transformers as Foundation Models for Efficient Segmentation FLIP:一种高效的、以对象为中心的视觉Transformer,用于高效分割。 foundation model
16 Personalization Toolkit: Training Free Personalization of Large Vision Language Models 提出一种免训练的LVLM个性化工具包,并构建了真实场景个性化评测基准。 foundation model
17 Deep Learning-Based Facial Expression Recognition for the Elderly: A Systematic Review 综述:基于深度学习的老年人面部表情识别技术及其挑战 multimodal
18 EditIQ: Automated Cinematic Editing of Static Wide-Angle Videos via Dialogue Interpretation and Saliency Cues EditIQ:基于对话理解和显著性线索的静态广角视频自动电影化剪辑 large language model
19 AutoGUI: Scaling GUI Grounding with Automatic Functionality Annotations from LLMs AutoGUI:利用LLM自动标注GUI功能,扩展GUI场景下的VLM应用 large language model
20 Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration 提出注意力校准方法,缓解大型视觉语言模型中的对象幻觉问题 multimodal
21 D-Attn: Decomposed Attention for Large Vision-and-Language Models 提出分解注意力机制D-Attn,提升大规模视觉语言模型性能与效率。 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (3 篇)

#题目一句话要点标签🔗
22 GP-GS: Gaussian Processes Densification for 3D Gaussian Splatting GP-GS:提出基于高斯过程的3D高斯溅射点云稠密化方法,提升渲染质量。 3D gaussian splatting 3DGS gaussian splatting
23 DOC-Depth: A novel approach for dense depth ground truth generation DOC-Depth:一种用于生成稠密深度真值的新方法,提升动态环境深度估计。 depth estimation
24 Geometric Neural Process Fields 提出几何神经过程场(G-NPF)以提升神经场在新信号下的泛化能力 neural radiance field

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
25 CASIM: Composite Aware Semantic Injection for Text to Motion Generation 提出CASIM,通过组合感知语义注入提升文本到动作生成质量与可控性。 text-to-motion motion generation
26 VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models VideoJAM:通过联合表观-运动表征增强视频模型中的运动生成 motion generation
27 Muographic Image Upsampling with Machine Learning for Built Infrastructure Applications 提出基于深度学习的缪子成像上采样方法,加速桥梁等基础设施的无损检测。 penetration

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
28 Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling Articulate AnyMesh:提出一种开放词汇的3D可动对象建模框架 manipulation open-vocabulary open vocabulary
29 IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning IMDPrompter:通过跨视角自动提示学习,使SAM适应图像篡改检测 manipulation
30 Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation 提出基于扩散模型的域泛化9自由度物体姿态估计方法Diff9D manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
31 Hier-EgoPack: Hierarchical Egocentric Video Understanding with Diverse Task Perspectives Hier-EgoPack:用于多任务视角的层级化自中心视频理解框架 egocentric Ego4D

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
32 A Robust Remote Photoplethysmography Method 提出一种鲁棒的远距离光电容积脉搏波(rPPG)方法,提升复杂环境下心率测量的准确性。 PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页