cs.CV(2025-12-02)

📊 共 11 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (5) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
1 Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities 提出上下文图像攻击方法以解决多模态安全漏洞问题 large language model multimodal
2 See, Think, Learn: A Self-Taught Multimodal Reasoner 提出See-Think-Learn框架,通过自训练提升视觉语言模型的多模态推理能力。 multimodal chain-of-thought
3 WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning 提出WorldMM:动态多模态记忆代理,用于长视频推理。 large language model multimodal
4 Polar Perspectives: Evaluating 2-D LiDAR Projections for Robust Place Recognition with Visual Foundation Models 利用视觉基础模型,研究LiDAR投影方式对稳健位置识别的影响 foundation model
5 LLM-Guided Material Inference for 3D Point Clouds 提出LLM引导的材质推断方法,从3D点云中推断材质组成。 large language model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
6 Flux4D: Flow-based Unsupervised 4D Reconstruction Flux4D:基于光流的无监督大规模动态场景4D重建 3D gaussian splatting 3DGS gaussian splatting
7 Content-Aware Texturing for Gaussian Splatting 提出内容感知纹理化高斯溅射,提升渲染质量并减少参数量 gaussian splatting splatting
8 SurfFill: Completion of LiDAR Point Clouds via Gaussian Surfel Splatting SurfFill:利用高斯 Surfel Splatting 完成 LiDAR 点云补全 splatting
9 BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection BEVDilation:一种以激光雷达为中心的多模态融合3D目标检测方法 depth estimation

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
10 U4D: Uncertainty-Aware 4D World Modeling from LiDAR Sequences U4D:面向自动驾驶,提出不确定性感知的LiDAR序列4D世界建模方法 world model embodied AI
11 ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning ReVSeg:利用强化学习激励推理链,实现视频分割 reinforcement learning

⬅️ 返回 cs.CV 首页 · 🏠 返回主页