cs.CV（2024-11-12）

📊 共 20 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (8 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (5 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting	GaussianCut：通过图割实现3D高斯 Splatting 的交互式分割	3D gaussian splatting 3DGS gaussian splatting
2	HiCoM: Hierarchical Coherent Motion for Streamable Dynamic Scene with 3D Gaussian Splatting	HiCoM：用于流式动态场景的层级相干运动3D高斯溅射方法	3D gaussian splatting 3DGS gaussian splatting
3	GUS-IR: Gaussian Splatting with Unified Shading for Inverse Rendering	GUS-IR：结合统一着色与高斯溅射的逆渲染框架，适用于复杂场景。	3D gaussian splatting 3DGS gaussian splatting
4	DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection	提出动态原型更新(DPU)框架，解决多模态OOD检测中类内差异问题。	optical flow multimodal
5	Material Transforms from Disentangled NeRF Representations	提出基于解耦NeRF表示的材质转换方法，实现跨场景材质编辑	NeRF neural radiance field	✅
6	Projecting Gaussian Ellipsoids While Avoiding Affine Projection Approximation	提出基于椭球投影的3D高斯溅射方法，提升新视角合成渲染质量。	3D gaussian splatting gaussian splatting splatting
7	Scaling Properties of Diffusion Models for Perceptual Tasks	利用扩散模型的可扩展性，统一解决深度估计、光流和无模态分割等感知任务。	depth estimation optical flow
8	ALOcc: Adaptive Lifting-Based 3D Semantic Occupancy and Cost Volume-Based Flow Predictions	提出基于自适应提升的3D语义占据和基于代价体的光流预测方法	scene understanding spatiotemporal

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
9	MSEG-VCUQ: Multimodal SEGmentation with Enhanced Vision Foundation Models, Convolutional Neural Networks, and Uncertainty Quantification for High-Speed Video Phase Detection Data	提出MSEG-VCUQ，融合视觉基础模型与CNN，解决高速视频相检测分割难题。	foundation model multimodal
10	JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation	JanusFlow：融合自回归与修正流，实现统一的多模态理解与生成	large language model multimodal
11	ImageRAG: Enhancing Ultra High Resolution Remote Sensing Imagery Analysis with ImageRAG	ImageRAG：通过图像检索增强生成提升超高分辨率遥感图像分析能力	large language model multimodal	✅
12	SimBase: A Simple Baseline for Temporal Video Grounding	SimBase：用于时序视频定位的简单有效基线方法	multimodal
13	BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions	BLIP3-KALE：提出知识增强的大规模密集图像描述数据集，提升视觉语言模型性能。	multimodal	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Aligning Visual Contrastive learning models via Preference Optimization	提出基于偏好优化的对比学习模型对齐方法，提升模型鲁棒性和公平性。	reinforcement learning RLHF DPO
15	GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation	GaussianAnything：交互式点云流匹配用于三维物体生成	flow matching
16	Breaking the Low-Rank Dilemma of Linear Attention	提出秩增强线性注意力（RALA）机制，突破线性注意力的低秩困境。	linear attention	✅
17	Flow Matching Posterior Sampling: A Training-free Conditional Generation for Flow Matching	提出基于流匹配后验采样的免训练条件生成方法，扩展流匹配模型应用范围	flow matching
18	Quantifying Knowledge Distillation Using Partial Information Decomposition	提出冗余信息蒸馏(RID)框架，提升知识蒸馏在噪声教师模型下的鲁棒性和有效性。	distillation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	A Novel Automatic Real-time Motion Tracking Method in MRI-guided Radiotherapy Using Enhanced Tracking-Learning-Detection Framework with Automatic Segmentation	提出ETLD+ICV框架，用于MRI引导放疗中自动实时无标记运动追踪与分割	motion tracking

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	CameraHMR: Aligning People with Perspective	CameraHMR：通过透视对齐提升单目图像人体姿态和形状估计精度	SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页