cs.CV（2024-10-29）

📊 共 19 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (2 🔗2) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective	综述：面向视觉基础模型的自回归统一理解与生成方法	large language model foundation model	✅
2	Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation	提出多模态融合的MM-FSS网络，解决少样本3D点云语义分割问题。	multimodal	✅
3	ContextIQ: A Multimodal Expert-Based Video Retrieval System for Contextual Advertising	ContextIQ：一种基于多模态专家的上下文广告视频检索系统	multimodal
4	A Survey on RGB, 3D, and Multimodal Approaches for Unsupervised Industrial Image Anomaly Detection	综述：面向非监督工业图像异常检测的RGB、3D和多模态方法	multimodal	✅
5	Enhanced Survival Prediction in Head and Neck Cancer Using Convolutional Block Attention and Multimodal Data Fusion	提出基于CBAM和多模态融合的深度学习模型，提升头颈癌生存预测精度。	multimodal
6	Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data	提出一种自合成方法，利用类人认知发展方式训练视觉-语言模型。	large language model multimodal
7	VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration	VL-Cache：针对视觉-语言模型推理加速的稀疏性和模态感知KV缓存压缩方法	large language model
8	A Lightweight Dual-Branch System for Weakly-Supervised Video Anomaly Detection on Consumer Edge Devices	提出RuleVAD，一种轻量级双分支系统，用于消费级边缘设备上的弱监督视频异常检测。	multimodal

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
9	PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting	PF3plat：无需位姿的单次前馈3D高斯溅射，实现无位姿图像的新视角合成	depth estimation monocular depth 3D gaussian splatting	✅
10	Exploiting Semantic Scene Reconstruction for Estimating Building Envelope Characteristics	提出BuildNet3D，利用语义场景重建估计建筑外围结构特征，助力建筑节能改造。	scene reconstruction
11	Diffusion as Reasoning: Enhancing Object Navigation via Diffusion Model Conditioned on LLM-based Object-Room Knowledge	提出基于扩散模型的对象导航方法，利用LLM知识增强环境理解	semantic map large language model
12	LiVisSfM: Accurate and Robust Structure-from-Motion with LiDAR and Visual Cues	LiVisSfM：融合激光雷达与视觉信息的精确鲁棒的SfM重建系统	visual odometry LIO
13	Motion Graph Unleashed: A Novel Approach to Video Prediction	提出运动图用于视频预测，显著降低模型尺寸和内存占用	optical flow
14	Active Event Alignment for Monocular Distance Estimation	提出基于主动事件对齐的单目距离估计方法，提升事件相机在复杂场景下的深度感知能力。	depth estimation

🔬 支柱二：RL算法与架构 (RL & Architecture) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Multi-Level Feature Distillation of Joint Teachers Trained on Distinct Image Datasets	提出多层特征蒸馏方法，利用多个独立数据集训练的联合教师模型提升学生模型性能。	teacher-student distillation	✅
16	EI-Nexus: Towards Unmediated and Flexible Inter-Modality Local Feature Extraction and Matching for Event-Image Data	EI-Nexus：用于事件-图像数据跨模态局部特征提取与匹配的无中介灵活框架	distillation feature matching	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
17	HRGR: Enhancing Image Manipulation Detection via Hierarchical Region-aware Graph Reasoning	提出HRGR：通过分层区域感知图推理增强图像篡改检测	manipulation	✅
18	DOFS: A Real-world 3D Deformable Object Dataset with Full Spatial Information for Dynamics Model Learning	DOFS：一个用于动力学模型学习的真实世界三维可变形物体全空间信息数据集	manipulation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Spatio-temporal Transformers for Action Unit Classification with Event Cameras	提出基于时空Transformer的事件相机动作单元分类方法，并构建了多模态人脸数据集FACEMORPHIC。	spatiotemporal multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页