cs.CV(2025-01-27)

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 VLMaterial: Procedural Material Generation with Large Vision-Language Models VLMaterial:利用视觉-语言大模型生成程序化材质 large language model
2 FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers FALCON:通过视觉寄存器解决高分辨率多模态大语言模型中的视觉冗余和碎片化问题 large language model multimodal
3 Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection? 提出Echo多专家框架,提升多模态大语言模型在工业异常检测中的性能 large language model multimodal
4 LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation 提出LoRA-X,实现LoRA模块在不同基础模型间的免训练迁移,解决模型替换后的重训练问题。 foundation model
5 Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation LangSeg:利用大语言模型辅助生成描述符,提升跨域语义分割性能。 large language model
6 Large Models in Dialogue for Active Perception and Anomaly Detection 提出基于LLM对话的主动感知框架,用于无人机自主监控中的异常检测。 large language model multimodal
7 DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation DynAlign:一种用于跨域分割的无监督动态分类对齐方法 foundation model
8 Understanding Long Videos via LLM-Powered Entity Relation Graphs 提出GraphVideoAgent,利用LLM驱动的实体关系图提升长视频理解 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
9 Deformable Beta Splatting 提出可变形Beta Splatting以解决3D辐射场重建问题 3D gaussian splatting 3DGS gaussian splatting
10 Controllable Hand Grasp Generation for HOI and Efficient Evaluation Methods 提出基于高阶几何表示的可控手部抓取生成方法,并设计高效评估指标。 affordance HOI
11 Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM 利用LLM生成定制化Prompt,用于零样本罕见事件医学图像分类 open-vocabulary open vocabulary large language model
12 Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction 提出一种基于多尺度标记投影的自动多相机标定方法,用于3D手术场景重建。 scene reconstruction
13 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding 提出PhysBench基准测试和PhysAgent框架,提升视觉语言模型对物理世界的理解 scene understanding embodied AI
14 LinPrim: Linear Primitives for Differentiable Volumetric Rendering 提出基于线性图元的体渲染方法,实现高效可微的 novel view synthesis。 NeRF

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
15 Distilling foundation models for robust and efficient models in digital pathology 提出H0-mini模型,通过知识蒸馏提升数字病理学中模型的鲁棒性和效率。 distillation foundation model
16 A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks 计算病理学中的Foundation Model综述:数据集、适配策略与评估任务 contrastive learning foundation model
17 Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration 提出TAMambaIR,一种高效的纹理感知状态空间模型,用于图像复原。 Mamba state space model
18 ARFlow: Autoregressive Flow with Hybrid Linear Attention ARFlow:结合自回归建模和混合线性注意力机制的Flow模型,提升图像生成质量。 linear attention classifier-free guidance
19 The Linear Attention Resurrection in Vision Transformer 提出L$^2$ViT,结合线性注意力与局部注意力,实现高效全局表征学习。 linear attention
20 NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation 提出NanoHTNet以解决边缘设备上3D人体姿态估计效率问题 contrastive learning implicit representation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
21 PackDiT: Joint Human Motion and Text Generation via Mutual Prompting PackDiT:通过互提示实现联合人体运动和文本生成 text-to-motion motion generation
22 BAG: Body-Aligned 3D Wearable Asset Generation 提出BAG:一种身体对齐的3D可穿戴资产生成方法,实现自动穿戴。 penetration

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
23 SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches SketchYourSeg:提出一种基于草图的无掩码主观图像分割框架。 spatial relationship foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页