cs.CV(2024-11-27)

📊 共 49 篇论文 | 🔗 13 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (13 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (11 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗5) 支柱一:机器人控制 (Robot Control) (7 🔗2) 支柱四:生成式动作 (Generative Motion) (4) 支柱八:物理动画 (Physics-based Animation) (2 🔗1) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (13 篇)

#题目一句话要点标签🔗
1 Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting 提出GS$^3$框架,通过3D高斯溅射加速点云无监督预训练,提升效率并降低内存占用。 3D gaussian splatting gaussian splatting splatting
2 HEMGS: A Hybrid Entropy Model for 3D Gaussian Splatting Data Compression 提出HEMGS混合熵模型,用于高效压缩3D高斯溅射数据,显著降低存储空间。 3D gaussian splatting 3DGS gaussian splatting
3 GLS: Geometry-aware 3D Language Gaussian Splatting GLS:基于几何感知的3D语言高斯溅射,实现表面重建与开放词汇分割的统一框架 3D gaussian splatting 3DGS gaussian splatting
4 Helvipad: A Real-World Dataset for Omnidirectional Stereo Depth Estimation 提出Helvipad数据集,用于全景立体深度估计,并改进模型性能。 depth estimation stereo depth
5 From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects 提出OWEL和MSCAL,使开放词汇物体检测模型具备开放世界物体检测能力 open-vocabulary open vocabulary
6 Textured Gaussians for Enhanced 3D Scene Appearance Modeling 提出纹理高斯以增强3D场景外观建模 3D gaussian splatting 3DGS gaussian splatting
7 GaussianSpeech: Audio-Driven Gaussian Avatars GaussianSpeech:提出基于3D高斯溅射的音频驱动高逼真度人头化身 3D gaussian splatting 3DGS gaussian splatting
8 Reconstructing Animals and the Wild 提出RAW框架,利用大型语言模型先验知识重建野生动物及其自然场景 scene understanding large language model
9 CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models CAT4D:利用多视角视频扩散模型实现任意4D场景创建 scene reconstruction TAMP
10 SmileSplat: Generalizable Gaussian Splats for Unconstrained Sparse Images SmileSplat:提出一种可泛化的高斯溅射方法,用于无约束稀疏图像的三维重建。 gaussian splatting splatting
11 An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition 提出基于RGB流和表征流的双流网络,用于端到端的人类行为识别,降低计算成本。 optical flow egocentric
12 MotionCharacter: Fine-Grained Motion Controllable Human Video Generation MotionCharacter:提出细粒度运动可控的人体视频生成框架,解决运动强度控制难题。 optical flow
13 RoMo: Robust Motion Segmentation Improves Structure from Motion RoMo:稳健的运动分割提升了基于动态场景的SfM相机标定效果 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (11 篇)

#题目一句话要点标签🔗
14 InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation InfiniDreamer:通过分段分数蒸馏实现任意长度的人体运动生成 dreamer distillation motion diffusion
15 SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation SharpDepth:利用扩散蒸馏锐化单目深度预测,提升精度与细节 distillation depth estimation metric depth
16 PhyCAGE: Physically Plausible Compositional 3D Asset Generation from a Single Image PhyCAGE:基于单张图像的物理可信组合3D资产生成 distillation 3D gaussian splatting gaussian splatting
17 Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning Critic-V:利用VLM评论家提升多模态推理中VLM的纠错能力 reinforcement learning DPO direct preference optimization
18 Surf-NeRF: Surface Regularised Neural Radiance Fields Surf-NeRF:提出表面正则化神经辐射场,提升几何重建精度 curriculum learning NeRF neural radiance field
19 AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward AToM:利用GPT-4Vision奖励,提升文本到动作生成模型在事件层面的对齐 reinforcement learning text-to-motion motion generation
20 Active Data Curation Effectively Distills Large-Scale Multimodal Models 提出ACID主动数据筛选方法,有效蒸馏大规模多模态模型。 distillation multimodal
21 ModeDreamer: Mode Guiding Score Distillation for Text-to-3D Generation using Reference Image Prompts ModeDreamer:利用参考图像提示引导的文本到3D生成模式蒸馏 dreamer distillation
22 Vision Mamba Distillation for Low-resolution Fine-grained Image Classification 提出Vision Mamba蒸馏方法以提升低分辨率细粒度图像分类性能 Mamba distillation
23 Incomplete Multi-view Multi-label Classification via a Dual-level Contrastive Learning Framework 提出双层对比学习框架,解决不完整多视角多标签分类问题 contrastive learning
24 Verbalized Representation Learning for Interpretable Few-Shot Generalization 提出VRL:利用自然语言特征提升小样本泛化能力和可解释性 representation learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
25 Autonomous Imagination: Closed-Loop Decomposition of Visual-to-Textual Conversion in Visual Reasoning for Multimodal Large Language Models 提出自主想象方法,解决多模态大语言模型在视觉推理中视觉-文本转换的瓶颈问题。 large language model multimodal
26 ChatRex: Taming Multimodal LLM for Joint Perception and Understanding ChatRex:驯服多模态LLM,实现联合感知与理解 large language model multimodal
27 SimCMF: A Simple Cross-modal Fine-tuning Strategy from Vision Foundation Models to Any Imaging Modality SimCMF:一种简易的跨模态微调策略,将视觉基础模型迁移至任意成像模态 foundation model
28 PDZSeg: Adapting the Foundation Model for Dissection Zone Segmentation with Visual Prompts in Robot-assisted Endoscopic Submucosal Dissection PDZSeg:利用视觉提示微调基础模型,实现机器人辅助内镜下剥离区精准分割 foundation model
29 RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model 提出RS-vHeat:一种热传导引导的高效遥感基础模型,提升计算效率和可解释性。 foundation model
30 OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation 提出OpenING基准测试集,用于评估开放式交错图像-文本生成模型 large language model multimodal
31 TimeMarker: A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability TimeMarker:一种具备卓越时间定位能力的多功能视频-LLM,用于长短视频理解 large language model multimodal
32 CoVis: A Collaborative Framework for Fine-grained Graphic Visual Understanding CoVis:用于细粒度图形视觉理解的协同框架,提升信息获取质量。 large language model
33 Evaluating Vision-Language Models as Evaluators in Path Planning 提出PathEval基准以评估视觉语言模型在路径规划中的有效性 large language model
34 DHCP: Detecting Hallucinations by Cross-modal Attention Pattern in Large Vision-Language Models DHCP:通过跨模态注意力模式检测大型视觉-语言模型中的幻觉 multimodal

🔬 支柱一:机器人控制 (Robot Control) (7 篇)

#题目一句话要点标签🔗
35 Neural Surface Priors for Editable Gaussian Splatting 提出基于神经表面先验的可编辑高斯溅射方法,实现场景外观的直观修改 manipulation 3D gaussian splatting gaussian splatting
36 Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation Lift3D:通过提升2D预训练模型实现鲁棒的3D机器人操作 manipulation masked autoencoder affordance
37 Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents 提出网格增强视觉方法,提升多模态Agent的空间理解能力 manipulation scene understanding spatial relationship
38 AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans 提出AdaVLN:在动态室内环境中实现视觉语言导航 sim-to-real VLN
39 Graph Canvas for Controllable 3D Scene Generation 提出GraphCanvas3D框架,实现可控的、动态适应的3D场景生成。 manipulation spatial relationship
40 AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers AC3D:通过分析和改进视频扩散Transformer中的3D相机控制,提升视频生成质量。 manipulation
41 ROICtrl: Boosting Instance Control for Visual Generation ROICtrl:通过区域实例控制增强视觉生成,解决文本描述复杂场景的局限性。 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (4 篇)

#题目一句话要点标签🔗
42 OOD-HOI: Text-Driven 3D Whole-Body Human-Object Interactions Generation Beyond Training Domains 提出OOD-HOI框架,解决文本驱动的3D全身人-物交互生成在域外泛化性问题 physically plausible human-object interaction HOI
43 Lifting Motion to the 3D World via 2D Diffusion MVLift:利用2D扩散模型,仅通过2D数据学习3D运动估计 motion diffusion model motion diffusion human-object interaction
44 XR-MBT: Multi-modal Full Body Tracking for XR through Self-Supervision with Learned Depth Point Cloud Registration 提出XR-MBT,利用自监督学习和深度点云配准实现XR设备中的多模态全身追踪。 motion synthesis egocentric
45 PersonaCraft: Personalized and Controllable Full-Body Multi-Human Scene Generation Using Occlusion-Aware 3D-Conditioned Diffusion PersonaCraft:基于遮挡感知3D条件扩散的多人全身个性化场景生成 classifier-free guidance SMPL-X

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
46 Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling 提出时空跳跃引导(STG)方法,提升视频扩散模型的采样质量,无需额外训练。 spatiotemporal
47 EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond EventCrab:融合帧和点信息的事件相机动作识别框架 spatiotemporal

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
48 VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis 提出VLM-HOI,利用视觉语言模型进行可解释的人-物交互分析 human-object interaction HOI

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
49 HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation HyperGLM:利用超图增强多模态LLM,实现视频场景图生成与预测 egocentric spatial relationship multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页