cs.CV(2025-01-31)

📊 共 27 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (10 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗3) 支柱四:生成式动作 (Generative Motion) (3) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
1 AIN: The Arabic INclusive Large Multimodal Model 提出AIN:一个阿拉伯语包容性大型多模态模型,在多领域超越GPT-4o。 large language model multimodal
2 Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models CADFusion:通过视觉反馈增强大语言模型,实现文本到CAD模型的生成。 large language model multimodal
3 CerraData-4MM: A multimodal benchmark dataset on Cerrado for land use and land cover classification 提出CerraData-4MM多模态数据集,用于塞拉多土地利用和土地覆盖分类 multimodal
4 Transformation trees -- documentation of multimodal image registration 提出基于变换树的多模态图像配准方法,提升医学图像处理的可追溯性和可重复性。 multimodal
5 Deep Ensembling with Multimodal Image Fusion for Efficient Classification of Lung Cancer 提出基于深度集成和多模态图像融合的DEMF网络,用于高效肺癌分类 multimodal
6 Fairness Analysis of CLIP-Based Foundation Models for X-Ray Image Classification 针对X射线图像分类,分析CLIP类模型在不同人口统计学属性上的公平性问题 foundation model
7 PixelWorld: How Far Are We from Perceiving Everything as Pixels? 提出PixelWorld基准,探索“万物皆像素”的统一感知范式,用于评估视觉-语言模型。 multimodal chain-of-thought
8 RedundancyLens: Revealing and Exploiting Visual Token Processing Redundancy for Efficient Decoder-Only MLLMs RedundancyLens揭示并利用视觉token处理冗余,提升Decoder-Only MLLM效率 large language model multimodal
9 Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs 提出视觉对抗扰动(VAP)方法,缓解大型视觉语言模型中的对象幻觉问题 large language model
10 TV-Dialogue: Crafting Theme-Aware Video Dialogues with Immersive Interaction 提出TV-Dialogue框架,用于生成主题感知的沉浸式视频对话 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
11 LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models LLMDet:利用大语言模型监督学习的强开放词汇目标检测器 open-vocabulary open vocabulary large language model
12 JGHand: Joint-Driven Animatable Hand Avater via 3D Gaussian Splatting 提出JGHand,一种基于3D高斯溅射的关节驱动可动画手部Avatar,实现高质量实时渲染。 3D gaussian splatting 3DGS gaussian splatting
13 Advancing Dense Endoscopic Reconstruction with Gaussian Splatting-driven Surface Normal-aware Tracking and Mapping 提出Endo-2DTAM,利用高斯溅射驱动的表面法线感知跟踪与建图,提升内窥镜稠密重建精度。 3D gaussian splatting 3DGS gaussian splatting
14 RaySplats: Ray Tracing based Gaussian Splatting RaySplats:提出基于光线追踪的高斯溅射方法,解决光照和阴影反射问题。 3D gaussian splatting 3DGS gaussian splatting
15 Contrast-Aware Calibration for Fine-Tuned CLIP: Leveraging Image-Text Alignment 提出对比感知校准(CAC),提升微调CLIP在开放词汇分类中的置信度校准。 open-vocabulary open vocabulary multimodal
16 A Survey on Class-Agnostic Counting: Advancements from Reference-Based to Open-World Text-Guided Approaches 对类别无关计数方法进行综述,涵盖参考式、无参考式和开放世界文本引导方法。 open-vocabulary open vocabulary
17 Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation 提出Lifting By Gaussians方法以解决3D实例分割问题 3DGS

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
18 Laser: Efficient Language-Guided Segmentation in Neural Radiance Fields 提出Laser,通过CLIP特征蒸馏实现神经辐射场中高效的语言引导分割。 distillation neural radiance field
19 EgoMe: A New Dataset and Challenge for Following Me via Egocentric View in Real World EgoMe:提出一个用于真实世界中以自我为中心视角进行模仿学习的新数据集与挑战。 imitation learning egocentric
20 XRF V2: A Dataset for Action Summarization with Wi-Fi Signals, and IMUs in Phones, Watches, Earbuds, and Glasses 提出XRF V2数据集和XRFMamba网络,用于Wi-Fi和IMU信号驱动的动作总结 Mamba foundation model multimodal
21 Imagine with the Teacher: Complete Shape in a Multi-View Distillation Way 提出基于多视角蒸馏的点云补全网络,提升三维形状补全效果 teacher-student distillation
22 RLS3: RL-Based Synthetic Sample Selection to Enhance Spatial Reasoning in Vision-Language Models for Indoor Autonomous Perception 提出基于强化学习的合成样本选择方法RLS3,增强视觉-语言模型在室内自主感知中的空间推理能力。 reinforcement learning visual grounding
23 Improving vision-language alignment with graph spiking hybrid Networks 提出图脉冲混合网络GSHN,提升视觉-语言对齐效果,增强语义表征能力。 contrastive learning spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (3 篇)

#题目一句话要点标签🔗
24 MotionPCM: Real-Time Motion Synthesis with Phased Consistency Model MotionPCM:基于相位一致性模型的实时人体运动合成 motion synthesis
25 GDO:Gradual Domain Osmosis 提出渐进域渗透(GDO)方法,解决渐进领域自适应中的知识迁移问题。 penetration
26 OmniPhysGS: 3D Constitutive Gaussians for General Physics-Based Dynamics Generation 提出OmniPhysGS以解决复杂物体物理动态生成问题 physically plausible

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
27 UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent 提出UP-VLA模型,通过统一理解与预测目标提升具身智能体的性能。 manipulation vision-language-action VLA

⬅️ 返回 cs.CV 首页 · 🏠 返回主页