cs.CV（2025-01-27）

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗1) 支柱四：生成式动作 (Generative Motion) (2 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	VLMaterial: Procedural Material Generation with Large Vision-Language Models	VLMaterial：利用视觉-语言大模型生成程序化材质	large language model
2	FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers	FALCON：通过视觉寄存器解决高分辨率多模态大语言模型中的视觉冗余和碎片化问题	large language model multimodal
3	Can Multimodal Large Language Models be Guided to Improve Industrial Anomaly Detection?	提出Echo多专家框架，提升多模态大语言模型在工业异常检测中的性能	large language model multimodal
4	LoRA-X: Bridging Foundation Models with Training-Free Cross-Model Adaptation	提出LoRA-X，实现LoRA模块在不同基础模型间的免训练迁移，解决模型替换后的重训练问题。	foundation model
5	Cross-Domain Semantic Segmentation with Large Language Model-Assisted Descriptor Generation	LangSeg：利用大语言模型辅助生成描述符，提升跨域语义分割性能。	large language model
6	Large Models in Dialogue for Active Perception and Anomaly Detection	提出基于LLM对话的主动感知框架，用于无人机自主监控中的异常检测。	large language model multimodal
7	DynAlign: Unsupervised Dynamic Taxonomy Alignment for Cross-Domain Segmentation	DynAlign：一种用于跨域分割的无监督动态分类对齐方法	foundation model
8	Understanding Long Videos via LLM-Powered Entity Relation Graphs	提出GraphVideoAgent，利用LLM驱动的实体关系图提升长视频理解	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Deformable Beta Splatting	提出可变形Beta Splatting以解决3D辐射场重建问题	3D gaussian splatting 3DGS gaussian splatting	✅
10	Controllable Hand Grasp Generation for HOI and Efficient Evaluation Methods	提出基于高阶几何表示的可控手部抓取生成方法，并设计高效评估指标。	affordance HOI
11	Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM	利用LLM生成定制化Prompt，用于零样本罕见事件医学图像分类	open-vocabulary open vocabulary large language model
12	Automatic Calibration of a Multi-Camera System with Limited Overlapping Fields of View for 3D Surgical Scene Reconstruction	提出一种基于多尺度标记投影的自动多相机标定方法，用于3D手术场景重建。	scene reconstruction
13	PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding	提出PhysBench基准测试和PhysAgent框架，提升视觉语言模型对物理世界的理解	scene understanding embodied AI
14	LinPrim: Linear Primitives for Differentiable Volumetric Rendering	提出基于线性图元的体渲染方法，实现高效可微的 novel view synthesis。	NeRF

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Distilling foundation models for robust and efficient models in digital pathology	提出H0-mini模型，通过知识蒸馏提升数字病理学中模型的鲁棒性和效率。	distillation foundation model
16	A Survey on Computational Pathology Foundation Models: Datasets, Adaptation Strategies, and Evaluation Tasks	计算病理学中的Foundation Model综述：数据集、适配策略与评估任务	contrastive learning foundation model
17	Directing Mamba to Complex Textures: An Efficient Texture-Aware State Space Model for Image Restoration	提出TAMambaIR，一种高效的纹理感知状态空间模型，用于图像复原。	Mamba state space model
18	ARFlow: Autoregressive Flow with Hybrid Linear Attention	ARFlow：结合自回归建模和混合线性注意力机制的Flow模型，提升图像生成质量。	linear attention classifier-free guidance
19	The Linear Attention Resurrection in Vision Transformer	提出L$^2$ViT，结合线性注意力与局部注意力，实现高效全局表征学习。	linear attention
20	NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation	提出NanoHTNet以解决边缘设备上3D人体姿态估计效率问题	contrastive learning implicit representation	✅

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
21	PackDiT: Joint Human Motion and Text Generation via Mutual Prompting	PackDiT：通过互提示实现联合人体运动和文本生成	text-to-motion motion generation
22	BAG: Body-Aligned 3D Wearable Asset Generation	提出BAG：一种身体对齐的3D可穿戴资产生成方法，实现自动穿戴。	penetration	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	SketchYourSeg: Mask-Free Subjective Image Segmentation via Freehand Sketches	SketchYourSeg：提出一种基于草图的无掩码主观图像分割框架。	spatial relationship foundation model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页