cs.CV（2025-01-17）

📊 共 16 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗4) 支柱三：空间感知与语义 (Perception & Semantics) (5 🔗2) 支柱一：机器人控制 (Robot Control) (3 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis	提出SeeUnsafe框架，利用多模态大语言模型进行视频交通安全分析，实现交互式事故分析。	large language model multimodal visual grounding	✅
2	FaceXBench: Evaluating Multimodal LLMs on Face Understanding	提出FaceXBench以评估多模态大语言模型的面部理解能力	large language model multimodal chain-of-thought	✅
3	Few-shot Structure-Informed Machinery Part Segmentation with Foundation Models and Graph Neural Networks	提出一种基于基础模型和图神经网络的少样本机械部件结构感知分割方法	foundation model
4	FiLo++: Zero-/Few-Shot Anomaly Detection by Fused Fine-Grained Descriptions and Deformable Localization	FiLo++：融合细粒度描述与可变形定位的零/少样本异常检测	large language model foundation model multimodal	✅
5	FLORA: Formal Language Model Enables Robust Training-free Zero-shot Object Referring Analysis	提出FLORA，利用形式语言模型实现鲁棒的无训练零样本对象指代表达式理解	large language model visual grounding
6	HiMix: Reducing Computational Complexity in Large Vision-Language Models	HiMix：通过分层视觉注入混合注意力机制降低大型视觉语言模型的计算复杂度	large language model	✅
7	Mitigating Hallucinations on Object Attributes using Multiview Images and Negative Instructions	提出MIAVLM，利用多视角图像和负指令缓解LVLM在物体属性上的幻觉问题	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography	提出PT-Fusion，融合PCA和TSR模态，提升脉冲热成像缺陷分割与深度估计精度。	depth estimation PULSE
9	GauSTAR: Gaussian Surface Tracking and Reconstruction	GauSTAR：提出基于高斯表面的动态场景跟踪与重建方法，解决拓扑结构变化难题。	3D gaussian splatting gaussian splatting splatting	✅
10	High Resolution Tree Height Mapping of the Amazon Forest using Planet NICFI Images and LiDAR-Informed U-Net Model	利用Planet NICFI影像和LiDAR辅助U-Net模型实现亚马逊森林高分辨率树高测绘	height map
11	One-D-Piece: Image Tokenizer Meets Quality-Controllable Compression	One-D-Piece：面向质量可控压缩的可变长图像Token化方法	depth estimation
12	Surface-SOS: Self-Supervised Object Segmentation via Neural Surface Representation	提出Surface-SOS，利用神经表面表示实现自监督物体分割。	NeRF	✅

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Zero-Shot Monocular Scene Flow Estimation in the Wild	提出零样本单目场景流估计方法，提升野外场景的实用性	manipulation predictive model depth estimation
14	FoundationStereo: Zero-Shot Stereo Matching	提出FoundationStereo，实现立体匹配的零样本泛化能力。	sim-to-real depth estimation stereo depth	✅
15	Disharmony: Forensics using Reverse Lighting Harmonization	提出Disharmony Network，利用光照和谐数据增强图像篡改和生成内容检测。	manipulation

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Spatio-temporal Graph Learning on Adaptive Mined Key Frames for High-performance Multi-Object Tracking	提出基于自适应关键帧挖掘的时空图学习多目标跟踪方法	reinforcement learning spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页