cs.CV(2024-11-12)

📊 共 20 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (5 🔗1) 支柱八:物理动画 (Physics-based Animation) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting GaussianCut:通过图割实现3D高斯 Splatting 的交互式分割 3D gaussian splatting 3DGS gaussian splatting
2 HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting HiCoM:用于流式动态场景的层级相干运动3D高斯溅射方法 3D gaussian splatting 3DGS gaussian splatting
3 GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering GUS-IR:结合统一着色与高斯溅射的逆渲染框架,适用于复杂场景。 3D gaussian splatting 3DGS gaussian splatting
4 DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection 提出动态原型更新(DPU)框架,解决多模态OOD检测中类内差异问题。 optical flow multimodal
5 Material Transforms from Disentangled NeRF Representations 提出基于解耦NeRF表示的材质转换方法,实现跨场景材质编辑 NeRF neural radiance field
6 Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation 提出基于椭球投影的3D高斯溅射方法,提升新视角合成渲染质量。 3D gaussian splatting gaussian splatting splatting
7 Scaling Properties of Diffusion Models for Perceptual Tasks 利用扩散模型的可扩展性,统一解决深度估计、光流和无模态分割等感知任务。 depth estimation optical flow
8 ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions 提出基于自适应提升的3D语义占据和基于代价体的光流预测方法 scene understanding spatiotemporal

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
9 MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data 提出MSEG-VCUQ,融合视觉基础模型与CNN,解决高速视频相检测分割难题。 foundation model multimodal
10 JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation JanusFlow:融合自回归与修正流,实现统一的多模态理解与生成 large language model multimodal
11 ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG ImageRAG:通过图像检索增强生成提升超高分辨率遥感图像分析能力 large language model multimodal
12 SimBase: A Simple Baseline for Temporal Video Grounding SimBase:用于时序视频定位的简单有效基线方法 multimodal
13 BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions BLIP3-KALE:提出知识增强的大规模密集图像描述数据集,提升视觉语言模型性能。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (5 篇)

#题目一句话要点标签🔗
14 Aligning Visual Contrastive learning models via Preference Optimization 提出基于偏好优化的对比学习模型对齐方法,提升模型鲁棒性和公平性。 reinforcement learning RLHF DPO
15 GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation GaussianAnything:交互式点云流匹配用于三维物体生成 flow matching
16 Breaking the Low-Rank Dilemma of Linear Attention 提出秩增强线性注意力(RALA)机制,突破线性注意力的低秩困境。 linear attention
17 Flow Matching Posterior Sampling: A Training-free Conditional Generation for Flow Matching 提出基于流匹配后验采样的免训练条件生成方法,扩展流匹配模型应用范围 flow matching
18 Quantifying Knowledge Distillation Using Partial Information Decomposition 提出冗余信息蒸馏(RID)框架,提升知识蒸馏在噪声教师模型下的鲁棒性和有效性。 distillation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
19 A Novel Automatic Real-time Motion Tracking Method in MRI-guided Radiotherapy Using Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation 提出ETLD+ICV框架,用于MRI引导放疗中自动实时无标记运动追踪与分割 motion tracking

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
20 CameraHMR: Aligning People with Perspective CameraHMR:通过透视对齐提升单目图像人体姿态和形状估计精度 SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页