cs.CV(2024-11-06)

📊 共 14 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (5) 支柱二:RL算法与架构 (RL & Architecture) (2) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
1 Textual Decomposition Then Sub-motion-space Scattering for Open-Vocabulary Motion Generation 提出DSO-Net,通过文本分解和子运动空间散射解决开放词汇运动生成问题。 open-vocabulary open vocabulary text-to-motion
2 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement 提出基于3D高斯溅射的3DGS-CD方法,用于检测物理对象重排列。 3D gaussian splatting 3DGS gaussian splatting
3 Structure Consistent Gaussian Splatting with Matching Prior for Few-shot Novel View Synthesis 提出SCGaussian,利用匹配先验和结构一致性高斯溅射实现少样本新视角合成 3D gaussian splatting 3DGS gaussian splatting
4 GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splatting GS2Pose:利用高斯溅射引导的两阶段6D物体姿态估计 3D gaussian splatting 3DGS gaussian splatting
5 Adaptive Stereo Depth Estimation with Multi-Spectral Images Across All Lighting Conditions 提出跨模态特征匹配和退化掩码的自适应多光谱立体深度估计方法 depth estimation stereo depth feature matching
6 Revisiting Disparity from Dual-Pixel Images: Physics-Informed Lightweight Depth Estimation 提出基于物理信息的轻量级双像素图像深度估计方法,实现高性能和小模型。 depth estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
7 Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM 提出MM-Detect框架,系统分析多模态LLM中的数据泄露问题 large language model multimodal
8 ReEdit: Multimodal Exemplar-Based Image Editing with Diffusion Models ReEdit:基于扩散模型的多模态范例图像编辑框架 multimodal
9 Face Reconstruction from Face Embeddings using Adapter to a Face Foundation Model 提出基于适配器的面部基础模型,用于从人脸嵌入中重建人脸图像 foundation model
10 StreamingBench: Assessing the Gap for MLLMs to Achieve Streaming Video Understanding StreamingBench:评估MLLM在流视频理解能力上与人类的差距 large language model multimodal
11 SA3DIP: Segment Any 3D Instance with Potential 3D Priors SA3DIP:利用潜在3D先验分割任意3D实例,提升零样本分割性能。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
12 SEE-DPO: Self Entropy Enhanced Direct Preference Optimization 提出自熵增强直接偏好优化(SEE-DPO)以提升文本到图像扩散模型的训练稳定性和图像质量。 reinforcement learning DPO direct preference optimization
13 Prion-ViT: Prions-Inspired Vision Transformers for Temperature prediction with Specklegrams 提出Prion-ViT以提高光纤散斑传感器的温度预测精度 predictive model MAE

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
14 These Maps Are Made by Propagation: Adapting Deep Stereo Networks to Road Scenarios with Decisive Disparity Diffusion D3Stereo:利用决定性视差扩散,将深度立体匹配网络适配于道路场景 feature matching

⬅️ 返回 cs.CV 首页 · 🏠 返回主页