cs.CV（2025-02-13）

📊 共 24 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (12 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4) 支柱一：机器人控制 (Robot Control) (2)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (12 篇)

#	题目	一句话要点	标签	🔗	⭐
1	MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency	提出MME-CoT基准以评估多模态模型的推理能力	large language model multimodal chain-of-thought
2	On the robustness of multimodal language model towards distractions	评估多模态语言模型在视觉和文本干扰下的鲁棒性，并提出缓解策略。	multimodal
3	ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models	ZeroBench：为当代大型多模态模型设计的、不可能完成的视觉推理基准测试。	multimodal
4	Multimodal HIE Lesion Segmentation in Neonates: A Comparative Study of Loss Functions	针对新生儿脑缺血缺氧性脑病病灶分割，提出优化的复合损失函数。	multimodal
5	Exploring the Potential of Encoder-free Architectures in 3D LMMs	提出ENEL：首个无编码器的3D大语言模型，提升3D场景理解能力	large language model multimodal	✅
6	A Benchmark for Crime Surveillance Video Analysis with Large Models	提出UCVL：一个用于犯罪监控视频分析的大模型评测基准	large language model multimodal
7	A Solver-Aided Hierarchical Language for LLM-Driven CAD Design	提出AIDL：一种求解器辅助的分层语言，用于LLM驱动的CAD设计	large language model
8	Long-Term TalkingFace Generation via Motion-Prior Conditional Diffusion Model	提出MCDM模型，利用运动先验条件扩散生成长期连贯的TalkingFace视频	multimodal
9	From Visuals to Vocabulary: Establishing Equivalence Between Image and Text Token Through Autoregressive Pre-training in MLLMs	提出VDEP，通过自回归预训练增强MLLM图像和文本token的对齐，提升多模态理解能力。	multimodal
10	Evolution of Data-driven Single- and Multi-Hazard Susceptibility Mapping and Emergence of Deep Learning Methods	综述性论文：探讨数据驱动的单灾害与多灾害易感性制图演进及深度学习方法的兴起	multimodal
11	EventSTR: A Benchmark Dataset and Baselines for Event Stream based Scene Text Recognition	提出EventSTR数据集与SimC-ESTR框架，用于事件流数据驱动的场景文本识别。	large language model	✅
12	DiffoRA: Enabling Parameter-Efficient Fine-Tuning via Differential Module Selection	DiffoRA：通过差异化模块选择实现参数高效的微调	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
13	DenseSplat: Densifying Gaussian Splatting SLAM with Neural Radiance Prior	DenseSplat：利用神经辐射先验稠密化高斯溅射SLAM，解决稀疏视图下的地图空洞问题	3DGS gaussian splatting splatting
14	Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction	提出自校准高斯溅射方法，用于大视场重建，提升相机参数和镜头畸变优化。	gaussian splatting splatting scene reconstruction
15	Large Images are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting	LIG：利用多层2D高斯溅射实现高质量大图像表示	gaussian splatting splatting	✅
16	SteROI-D: System Design and Mapping for Stereo Depth Inference on Regions of Interest	SteROI-D：面向感兴趣区域的立体深度推理系统设计与映射方法	depth estimation stereo depth
17	CoL3D: Collaborative Learning of Single-view Depth and Camera Intrinsics for Metric 3D Shape Recovery	CoL3D：单视图深度与相机内参协同学习，实现度量3D形状恢复	depth estimation monocular depth
18	Vision-based Geo-Localization of Future Mars Rotorcraft in Challenging Illumination Conditions	提出Geo-LoFTR，解决火星旋翼机在恶劣光照下的视觉定位难题	visual odometry

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Weight Space Representation Learning on Diverse NeRF Architectures	提出一种架构无关的NeRF权重空间表征学习框架，用于处理多样NeRF架构。	representation learning NeRF neural radiance field
20	ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization	ConsistentDreamer：通过平衡多视角高斯优化实现视角一致的网格模型生成	dreamer distillation embodied AI
21	Object-Centric Latent Action Learning	提出对象中心潜在动作学习框架，解决具身智能在复杂视觉环境中动作学习的难题。	imitation learning embodied AI
22	Prior-Constrained Association Learning for Fine-Grained Generalized Category Discovery	提出先验约束关联学习方法，用于细粒度广义类别发现任务。	representation learning distillation

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
23	ShapeLib: Designing a library of programmatic 3D shape abstractions with Large Language Models	ShapeLib：利用大语言模型设计可编程3D形状抽象库	manipulation large language model
24	RigAnything: Template-Free Autoregressive Rigging for Diverse 3D Assets	RigAnything：提出一种无需模板的自回归骨骼绑定方法，适用于多样化的3D资产。	quadruped humanoid

⬅️ 返回 cs.CV 首页 · 🏠 返回主页