cs.CV（2025-01-15）

📊 共 21 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗3) 支柱七：动作重定向 (Motion Retargeting) (2) 支柱四：生成式动作 (Generative Motion) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation	提出FATE-SAM，实现免训练的3D医学图像分割小样本自适应	foundation model
2	Multimodal LLMs Can Reason about Aesthetics in Zero-Shot	提出ArtCoT，提升多模态LLM在零样本美学推理中的表现	multimodal	✅
3	Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities	探讨时空基础模型：分析其愿景、挑战与机遇，旨在推动更广泛应用。	foundation model
4	Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation	DETRIS：面向Referring Image Segmentation，提出密集连接的参数高效微调框架	foundation model multimodal	✅
5	Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures	提出一种统一的混凝土裂缝少样本分割与精确3D自动测量方法	foundation model
6	IDEA: Image Description Enhanced CLIP-Adapter	提出IDEA：一种图像描述增强的CLIP-Adapter，用于提升小样本图像分类性能。	multimodal	✅
7	RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency	RealVVT：通过时空一致性实现逼真的视频虚拟试穿	foundation model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
8	BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation	BloomScene：轻量级结构化3D高斯溅射用于跨模态场景生成	monocular depth 3D gaussian splatting gaussian splatting
9	Embodied Scene Understanding for Vision Language Models via MetaVQA	提出MetaVQA以解决视觉语言模型的空间推理评估问题	scene understanding spatial relationship embodied AI	✅
10	MonSter++: Unified Stereo Matching, Multi-view Stereo, and Real-time Stereo with Monodepth Priors	MonSter++：融合单目深度先验的统一立体匹配与多视角立体视觉框架	depth estimation monocular depth metric depth
11	CityLoc: 6DoF Pose Distributional Localization for Text Descriptions in Large-Scale Scenes with Gaussian Representation	CityLoc：基于高斯分布表示，解决大规模场景下文本描述的6DoF位姿定位问题	3D gaussian splatting gaussian splatting splatting
12	BRIGHT-VO: Brightness-Guided Hybrid Transformer for Visual Odometry with Multi-modality Refinement Module	BrightVO：亮度引导的混合Transformer视觉里程计，结合多模态优化模块，提升弱光环境性能。	visual odometry	✅
13	ZeroStereo: Zero-shot Stereo Matching from Single Images	提出ZeroStereo以解决真实场景下立体匹配的泛化问题	depth estimation monocular depth scene flow	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Vision Foundation Models for Computed Tomography	提出CT-FM：基于大规模CT扫描的医学影像分割与理解的视觉基础模型	contrastive learning foundation model
15	FlexiClip: Locality-Preserving Free-Form Character Animation	FlexiClip：提出局部性保持的自由形式卡通角色动画方法，提升动画质量。	flow matching character animation	✅
16	Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation	提出视角感知教学框架，实现异构架构间知识蒸馏	distillation	✅
17	MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation	提出MANTA：一种基于Diffusion Mamba的高效长时密集动作预测方法	Mamba	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
18	RepVideo: Rethinking Cross-Layer Representation for Video Generation	RepVideo：通过重构跨层表示增强视频生成的时间一致性和空间准确性	spatial relationship
19	Computerized Assessment of Motor Imitation for Distinguishing Autism in Video (CAMI-2DNet)	提出CAMI-2DNet，一种基于深度学习的运动模仿评估方法，用于区分自闭症患者。	motion retargeting

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection	提出基于Patch感知的向量量化码本学习方法，用于无监督视觉缺陷检测	VQ-VAE

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Joint Learning of Depth and Appearance for Portrait Image Animation	提出基于扩散模型的联合深度与外观学习框架，用于高质量人像图像动画	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页