cs.CV(2025-02-13)

📊 共 24 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (12 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (4) 支柱一:机器人控制 (Robot Control) (2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (12 篇)

#题目一句话要点标签🔗
1 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency 提出MME-CoT基准以评估多模态模型的推理能力 large language model multimodal chain-of-thought
2 On the robustness of multimodal language model towards distractions 评估多模态语言模型在视觉和文本干扰下的鲁棒性,并提出缓解策略。 multimodal
3 ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models ZeroBench:为当代大型多模态模型设计的、不可能完成的视觉推理基准测试。 multimodal
4 Multimodal HIE Lesion Segmentation in Neonates: A Comparative Study of Loss Functions 针对新生儿脑缺血缺氧性脑病病灶分割,提出优化的复合损失函数。 multimodal
5 Exploring the Potential of Encoder-free Architectures in 3D LMMs 提出ENEL:首个无编码器的3D大语言模型,提升3D场景理解能力 large language model multimodal
6 A Benchmark for Crime Surveillance Video Analysis with Large Models 提出UCVL:一个用于犯罪监控视频分析的大模型评测基准 large language model multimodal
7 A Solver-Aided Hierarchical Language for LLM-Driven CAD Design 提出AIDL:一种求解器辅助的分层语言,用于LLM驱动的CAD设计 large language model
8 Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model 提出MCDM模型,利用运动先验条件扩散生成长期连贯的TalkingFace视频 multimodal
9 From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs 提出VDEP,通过自回归预训练增强MLLM图像和文本token的对齐,提升多模态理解能力。 multimodal
10 Evolution of Data-driven Single- and Multi-Hazard Susceptibility Mapping and Emergence of Deep Learning Methods 综述性论文:探讨数据驱动的单灾害与多灾害易感性制图演进及深度学习方法的兴起 multimodal
11 EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition 提出EventSTR数据集与SimC-ESTR框架,用于事件流数据驱动的场景文本识别。 large language model
12 DiffoRA: Enabling Parameter-Efficient Fine-Tuning via Differential Module Selection DiffoRA:通过差异化模块选择实现参数高效的微调 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
13 DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior DenseSplat:利用神经辐射先验稠密化高斯溅射SLAM,解决稀疏视图下的地图空洞问题 3DGS gaussian splatting splatting
14 Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction 提出自校准高斯溅射方法,用于大视场重建,提升相机参数和镜头畸变优化。 gaussian splatting splatting scene reconstruction
15 Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting LIG:利用多层2D高斯溅射实现高质量大图像表示 gaussian splatting splatting
16 SteROI-D: System Design and Mapping for Stereo Depth Inference on Regions of Interest SteROI-D:面向感兴趣区域的立体深度推理系统设计与映射方法 depth estimation stereo depth
17 CoL3D: Collaborative Learning of Single-view Depth and Camera Intrinsics for Metric 3D Shape Recovery CoL3D:单视图深度与相机内参协同学习,实现度量3D形状恢复 depth estimation monocular depth
18 Vision-based Geo-Localization of Future Mars Rotorcraft in Challenging Illumination Conditions 提出Geo-LoFTR,解决火星旋翼机在恶劣光照下的视觉定位难题 visual odometry

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
19 Weight Space Representation Learning on Diverse NeRF Architectures 提出一种架构无关的NeRF权重空间表征学习框架,用于处理多样NeRF架构。 representation learning NeRF neural radiance field
20 ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization ConsistentDreamer:通过平衡多视角高斯优化实现视角一致的网格模型生成 dreamer distillation embodied AI
21 Object-Centric Latent Action Learning 提出对象中心潜在动作学习框架,解决具身智能在复杂视觉环境中动作学习的难题。 imitation learning embodied AI
22 Prior-Constrained Association Learning for Fine-Grained Generalized Category Discovery 提出先验约束关联学习方法,用于细粒度广义类别发现任务。 representation learning distillation

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
23 ShapeLib: Designing a library of programmatic 3D shape abstractions with Large Language Models ShapeLib:利用大语言模型设计可编程3D形状抽象库 manipulation large language model
24 RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets RigAnything:提出一种无需模板的自回归骨骼绑定方法,适用于多样化的3D资产。 quadruped humanoid

⬅️ 返回 cs.CV 首页 · 🏠 返回主页