cs.CV(2024-11-20)

📊 共 30 篇论文 | 🔗 13 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (13 🔗5) 支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗4) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
1 Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning 提出COBRA方法以解决病理全切片图像表示学习问题 Mamba representation learning foundation model
2 FAST-Splat: Fast, Ambiguity-Free Semantics Transfer in Gaussian Splatting FAST-Splat:快速无歧义的高斯溅射语义迁移方法 distillation gaussian splatting splatting
3 XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation 提出XMask3D,通过跨模态掩码推理实现开放词汇3D语义分割。 distillation open-vocabulary open vocabulary
4 Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark 提出密度嵌入高效掩码自编码计数框架(E-MAC),解决视频对象计数中前景-背景动态不平衡问题。 representation learning masked autoencoder optical flow
5 Extending Video Masked Autoencoders to 128 frames 提出长视频掩码自编码器(LVMAE),有效处理128帧视频,提升视频理解性能。 masked autoencoder MAE foundation model
6 MambaDETR: Query-based Temporal Modeling using State Space Model for Multi-View 3D Object Detection MambaDETR:利用状态空间模型进行多视角3D目标检测的查询式时序建模 Mamba state space model
7 Decompose and Leverage Preferences from Expert Models for Improving Trustworthiness of MLLMs 提出DecompGen框架,利用专家模型分解评估MLLM响应,提升其可信度。 preference learning large language model multimodal
8 Intensity-Spatial Dual Masked Autoencoder for Multi-Scale Feature Learning in Chest CT Segmentation 提出强度-空间双掩码自编码器(ISD-MAE)用于胸部CT多尺度特征学习与分割 masked autoencoder MAE contrastive learning
9 Find Any Part in 3D 利用2D基础模型驱动的数据引擎,实现任意3D物体部件的开放世界分割 world model foundation model
10 Identity Preserving 3D Head Stylization with Multiview Score Distillation 提出基于多视角Score Distillation的3D头部风格化方法,提升身份保持能力 distillation
11 Cross-Camera Distracted Driver Classification through Feature Disentanglement and Contrastive Learning 提出DBMNet,通过特征解耦和对比学习实现跨摄像头分心驾驶员分类。 contrastive learning
12 Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection 提出CFL-Detector,解决开放集半监督目标检测中的OOD误分类问题 contrastive learning
13 RobustFormer: Noise-Robust Pre-training for images and videos RobustFormer:一种噪声鲁棒的图像和视频预训练方法,利用DWT提升Transformer在噪声环境下的性能。 masked autoencoder MAE

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
14 GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting GazeGaussian:基于3D高斯溅射的高保真视线重定向 3D gaussian splatting 3DGS gaussian splatting
15 Generating 3D-Consistent Videos from Unposed Internet Photos 提出一种自监督方法,从无位姿互联网照片生成3D一致性视频 3D gaussian splatting gaussian splatting splatting
16 Sparse Input View Synthesis: 3D Representations and Reliable Priors 针对稀疏视角的新视角合成,提出基于3D表示和可靠先验的解决方案 NeRF neural radiance field optical flow
17 Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction 提出鲁棒的SG-NeRF,利用场景图辅助神经表面重建,解决相机姿态噪声问题。 NeRF
18 DATAP-SfM: Dynamic-Aware Tracking Any Point for Robust Structure from Motion in the Wild DATAP-SfM:动态感知追踪任意点,实现野外场景鲁棒的运动结构重建 depth estimation optical flow
19 Geometric Algebra Planes: Convex Implicit Neural Volumes 提出GA-Planes:一种可凸优化训练的隐式神经场表示方法,用于体积建模。 implicit representation
20 Practical Compact Deep Compressed Sensing 提出PCNet,一种实用紧凑的深度压缩感知网络,提升图像重建质量。 implicit representation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
21 VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation VideoAutoArena:通过用户模拟自动评估视频分析大模型的竞技场基准 multimodal
22 Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images 提出Cloud-Adapter,利用视觉基础模型实现鲁棒的遥感图像云分割 foundation model
23 Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving 提出Hints of Prompt (HoP)框架,增强多模态LLM在自动驾驶场景中的视觉表征能力 multimodal
24 MEGL: Multimodal Explanation-Guided Learning 提出MEGL:一种多模态解释引导学习框架,提升图像分类模型的可解释性和性能。 multimodal
25 Unsupervised Homography Estimation on Multimodal Image Pair via Alternating Optimization 提出AltO,通过交替优化解决多模态图像对的无监督单应性估计问题 multimodal
26 On the Consistency of Video Large Language Models in Temporal Comprehension 针对视频大语言模型时间理解一致性问题,提出事件时序验证调优方法 large language model
27 FabuLight-ASD: Unveiling Speech Activity via Body Language FabuLight-ASD:利用身体语言增强多模态环境下的语音活动检测 multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
28 DIS-Mine: Instance Segmentation for Disaster-Awareness in Poor-Light Condition in Underground Mines DIS-Mine:针对地下矿井弱光环境的灾害感知实例分割方法 feature matching
29 X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation 提出基于多假设检测与3D先验的无监督单目3D姿态估计方法 SMPL

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
30 REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents REDUCIO:利用极度压缩的运动潜在空间,在16秒内生成1K视频 motion latent

⬅️ 返回 cs.CV 首页 · 🏠 返回主页