cs.CV(2024-11-29)

📊 共 35 篇论文 | 🔗 15 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (14 🔗7) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗3) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱一:机器人控制 (Robot Control) (1) 支柱四:生成式动作 (Generative Motion) (1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (14 篇)

#题目一句话要点标签🔗
1 GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting GuardSplat:高效鲁棒的3D高斯溅射水印方案,保护3D资产版权 3D gaussian splatting 3DGS gaussian splatting
2 TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting TexGaussian:利用基于八叉树的3D高斯溅射生成高质量PBR材质 3D gaussian splatting gaussian splatting splatting
3 GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding 提出GREAT框架以解决开放词汇3D物体可用性定位问题 open-vocabulary open vocabulary affordance
4 T-3DGS: Removing Transient Objects for 3D Scene Reconstruction T-3DGS:提出一种移除瞬态对象的3D场景重建方法 3DGS gaussian splatting splatting
5 Tortho-Gaussian: Splatting True Digital Orthophoto Maps TOrtho-Gaussian:正射高斯溅射生成真数字正射影像地图 3D gaussian splatting 3DGS gaussian splatting
6 Gaussian Splashing: Direct Volumetric Rendering Underwater Gaussian Splashing:水下场景的快速体积渲染方法,提升渲染速度和细节清晰度。 depth estimation 3D gaussian splatting 3DGS
7 Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding 提出FreeGS以解决无监督3D场景理解中的语义一致性问题 3D gaussian splatting 3DGS gaussian splatting
8 ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model 提出ROSE以解决开放集密集分割问题 open-vocabulary open vocabulary multimodal
9 MonoPP: Metric-Scaled Self-Supervised Monocular Depth Estimation by Planar-Parallax Geometry in Automotive Applications MonoPP:利用平面视差几何实现汽车应用中度量尺度自监督单目深度估计 depth estimation monocular depth
10 DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering DeSplat:提出基于分解高斯溅射的无干扰物渲染方法 gaussian splatting splatting
11 Uni-SLAM: Uncertainty-Aware Neural Implicit SLAM for Real-Time Dense Indoor Scene Reconstruction Uni-SLAM:不确定性感知的神经隐式SLAM,用于实时稠密室内场景重建 visual SLAM scene reconstruction
12 LokiTalk: Learning Fine-Grained and Generalizable Correspondences to Enhance NeRF-based Talking Head Synthesis LokiTalk:学习细粒度和泛化的人脸对应关系,增强基于NeRF的说话头合成 NeRF neural radiance field
13 Quantifying the synthetic and real domain gap in aerial scene understanding 提出基于多模型共识和深度结构的度量方法,量化合成与真实航拍场景的领域差异。 scene understanding
14 Incremental Multi-Scene Modeling via Continual Neural Graphics Primitives 提出C-NGP,通过持续学习将多个场景增量式建模到单个神经辐射场中 NeRF neural radiance field

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
15 Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings 提出动态视觉Token退出机制(DyVTE),加速多模态大语言模型的推理。 large language model multimodal
16 SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters 提出SOLAMI框架,用于3D自主角色沉浸式社交视觉-语言-动作建模 vision-language-action VLA multimodal
17 Interleaved-Modal Chain-of-Thought 提出交错模态思维链(ICoT),提升视觉语言模型在复杂推理任务中的性能。 large language model multimodal chain-of-thought
18 GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis GalaxAlign:模仿公民科学家多模态指导的星系形态分析方法 foundation model multimodal
19 DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness DLaVA:一种用于答案定位的文档语言和视觉助手,提升了解释性和可信度 large language model multimodal chain-of-thought
20 Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise 提出CUFIT:一种面向带噪医学图像分类的视觉基础模型课程微调方法 foundation model
21 Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation Sparrow:一种基于文本到图像增强的数据高效视频-LLM方法 large language model multimodal
22 SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks SURE-VQA框架:系统评估医学VQA任务中视觉-语言模型的鲁棒性 large language model multimodal
23 STEP: Enhancing Video-LLMs' Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training STEP:时空图引导的自训练增强视频大语言模型组合推理能力 large language model chain-of-thought
24 LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos 提出LongVALE基准,用于长视频时序感知的全模态理解 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
25 ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration ReconDreamer:通过在线修复构建世界模型,提升驾驶场景重建质量 world model dreamer 3DGS
26 SkelMamba: A State Space Model for Efficient Skeleton Action Recognition of Neurological Disorders SkelMamba:一种用于神经系统疾病骨骼动作识别的高效状态空间模型 Mamba SSM state space model
27 FlowCLAS: Enhancing Normalizing Flow Via Contrastive Learning For Anomaly Segmentation FlowCLAS:利用对比学习增强归一化流,用于异常分割 contrastive learning foundation model
28 Pretrained Reversible Generation as Unsupervised Visual Representation Learning 提出预训练可逆生成(PRG)用于无监督视觉表征学习,提升下游任务性能。 flow matching representation learning
29 DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation 提出DELT以解决数据集蒸馏中的多样性不足问题 distillation
30 FairDD: Fair Dataset Distillation 提出FairDD框架,解决数据集蒸馏中对受保护属性的偏见问题。 distillation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
31 SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens 提出基于尺度自适应Token的SAT-HMR,用于实时多人3D人体网格估计。 HMR
32 FreeCloth: Free-form Generation Enhances Challenging Clothed Human Modeling FreeCloth:提出自由形态生成方法,增强复杂服装人体建模效果 SMPL

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
33 SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation SIMS:提出检索增强脚本生成方法,模拟风格化人-场景交互 locomotion motion planning human-scene interaction

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
34 MoTe: Learning Motion-Text Diffusion Model for Multiple Generation Tasks MoTe:学习运动-文本扩散模型,解决多任务运动生成问题 text-to-motion text-driven motion motion generation

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
35 The Streetscape Application Services Stack (SASS): Towards a Distributed Sensing Architecture for Urban Applications 提出SASS,解决城市街景应用中分布式异构传感器数据融合与实时处理难题。 spatiotemporal multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页