cs.CV(2025-01-15)

📊 共 21 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗3) 支柱七:动作重定向 (Motion Retargeting) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation 提出FATE-SAM,实现免训练的3D医学图像分割小样本自适应 foundation model
2 Multimodal LLMs Can Reason about Aesthetics in Zero-Shot 提出ArtCoT,提升多模态LLM在零样本美学推理中的表现 multimodal
3 Spatio-Temporal Foundation Models: Vision, Challenges, and Opportunities 探讨时空基础模型:分析其愿景、挑战与机遇,旨在推动更广泛应用。 foundation model
4 Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation DETRIS:面向Referring Image Segmentation,提出密集连接的参数高效微调框架 foundation model multimodal
5 Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures 提出一种统一的混凝土裂缝少样本分割与精确3D自动测量方法 foundation model
6 IDEA: Image Description Enhanced CLIP-Adapter 提出IDEA:一种图像描述增强的CLIP-Adapter,用于提升小样本图像分类性能。 multimodal
7 RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency RealVVT:通过时空一致性实现逼真的视频虚拟试穿 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
8 BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation BloomScene:轻量级结构化3D高斯溅射用于跨模态场景生成 monocular depth 3D gaussian splatting gaussian splatting
9 Embodied Scene Understanding for Vision Language Models via MetaVQA 提出MetaVQA以解决视觉语言模型的空间推理评估问题 scene understanding spatial relationship embodied AI
10 MonSter++: Unified Stereo Matching, Multi-view Stereo, and Real-time Stereo with Monodepth Priors MonSter++:融合单目深度先验的统一立体匹配与多视角立体视觉框架 depth estimation monocular depth metric depth
11 CityLoc: 6DoF Pose Distributional Localization for Text Descriptions in Large-Scale Scenes with Gaussian Representation CityLoc:基于高斯分布表示,解决大规模场景下文本描述的6DoF位姿定位问题 3D gaussian splatting gaussian splatting splatting
12 BRIGHT-VO: Brightness-Guided Hybrid Transformer for Visual Odometry with Multi-modality Refinement Module BrightVO:亮度引导的混合Transformer视觉里程计,结合多模态优化模块,提升弱光环境性能。 visual odometry
13 ZeroStereo: Zero-shot Stereo Matching from Single Images 提出ZeroStereo以解决真实场景下立体匹配的泛化问题 depth estimation monocular depth scene flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
14 Vision Foundation Models for Computed Tomography 提出CT-FM:基于大规模CT扫描的医学影像分割与理解的视觉基础模型 contrastive learning foundation model
15 FlexiClip: Locality-Preserving Free-Form Character Animation FlexiClip:提出局部性保持的自由形式卡通角色动画方法,提升动画质量。 flow matching character animation
16 Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation 提出视角感知教学框架,实现异构架构间知识蒸馏 distillation
17 MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Anticipation 提出MANTA:一种基于Diffusion Mamba的高效长时密集动作预测方法 Mamba

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
18 RepVideo: Rethinking Cross-Layer Representation for Video Generation RepVideo:通过重构跨层表示增强视频生成的时间一致性和空间准确性 spatial relationship
19 Computerized Assessment of Motor Imitation for Distinguishing Autism in Video (CAMI-2DNet) 提出CAMI-2DNet,一种基于深度学习的运动模仿评估方法,用于区分自闭症患者。 motion retargeting

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
20 Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection 提出基于Patch感知的向量量化码本学习方法,用于无监督视觉缺陷检测 VQ-VAE

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
21 Joint Learning of Depth and Appearance for Portrait Image Animation 提出基于扩散模型的联合深度与外观学习框架,用于高质量人像图像动画 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页