cs.CV(2024-11-28)

📊 共 27 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (10 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (3) 支柱七:动作重定向 (Motion Retargeting) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
1 Gaussians-to-Life: Text-Driven Animation of 3D Gaussian Splatting Scenes Gaussians2Life:提出一种文本驱动的3D高斯溅射场景动画方法。 3D gaussian splatting gaussian splatting splatting
2 InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception InstanceGaussian:面向3D实例级感知的表观-语义联合高斯表示 3D gaussian splatting 3DGS gaussian splatting
3 Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation Talk2DINO:融合自监督视觉骨干网络与语言,实现开放词汇分割 open-vocabulary open vocabulary
4 AGS-Mesh: Adaptive Gaussian Splatting and Meshing with Geometric Priors for Indoor Room Reconstruction Using Smartphones 提出AGS-Mesh,利用几何先验自适应优化高斯溅射,实现智能手机室内场景高精度重建 gaussian splatting splatting
5 SuperGaussians: Enhancing Gaussian Splatting Using Primitives with Spatially Varying Colors SuperGaussians:利用空间变化颜色基元增强高斯溅射 gaussian splatting splatting
6 On-chip Hyperspectral Image Segmentation with Fully Convolutional Networks for Scene Understanding in Autonomous Driving 利用片上全卷积网络的高光谱图像分割实现自动驾驶场景理解 scene understanding HSI
7 RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning RIIGI:通过不确定性感知学习校正图像到3D生成的不一致性 3D gaussian splatting 3DGS gaussian splatting
8 360Recon: An Accurate Reconstruction Method Based on Depth Fusion from 360 Images 360Recon:针对360度图像,提出基于深度融合的精确三维重建方法 depth estimation scene reconstruction geometric consistency
9 Video Depth without Video Models 提出RollingDepth,利用单图LDM实现高效准确的长视频深度估计 depth estimation foundation model
10 SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments SceneTAP:提出场景一致的排版对抗攻击,针对现实环境中视觉-语言模型 scene understanding chain-of-thought

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
11 Enhancing Few-Shot Vision-Language Classification with Large Multimodal Model Features 提出稀疏注意力向量(SAVs),提升大模型在少样本视觉-语言分类任务上的性能 multimodal
12 Automatic Prompt Generation and Grounding Object Detection for Zero-Shot Image Anomaly Detection 提出一种基于多模态零样本学习的工业图像异常检测方法,无需训练。 large language model foundation model multimodal
13 Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models 利用视觉基础模型,无需训练提升AI生成图像的检测性能 foundation model
14 Perception of Visual Content: Differences Between Humans and Foundation Models 对比人类与AI对视觉内容的感知差异,揭示其对模型性能与偏见的影响 foundation model
15 Libra: Leveraging Temporal Images for Biomedical Radiology Analysis Libra:利用时序影像进行生物医学放射学分析,提升报告生成质量。 large language model multimodal
16 CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections 提出NoLA,利用无标签图像集合微调CLIP零样本分类器,融合DINO的视觉特征。 large language model foundation model
17 Detailed Object Description with Controllable Dimensions 提出Dimension Tailor,提升多模态大语言模型在可控维度上的物体描述能力 large language model multimodal
18 Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation FreqFit:通过频域自适应增强Vision Transformer的参数高效微调 foundation model
19 Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads Orthus:基于模态特定头的自回归交错图像-文本生成模型 multimodal
20 ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives ObjectRelator:通过多模态融合与跨视角对齐实现自中心和以外中心视角的物体关系理解 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
21 PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors PCDreamer:利用多视角扩散先验实现点云补全 dreamer
22 Video Set Distillation: Information Diversification and Temporal Densification 提出视频集蒸馏方法以解决视频数据冗余问题 distillation
23 SAMa: Material-aware 3D Selection and Segmentation SAMa:提出一种材质感知的3D选择与分割方法,提升3D资产编辑效率。 contrastive learning NeRF

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
24 SOWing Information: Cultivating Contextual Coherence with MLLMs in Image Generation 提出SOW,利用MLLM在图像生成中实现上下文连贯性,提升细节保持和区域一致性。 spatial relationship large language model multimodal
25 OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation OpenHumanVid:大规模高质量人 centric 视频生成数据集,提升生成效果。 human motion

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
26 BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis 提出BiPO,通过双向部分遮挡网络增强文本到动作合成效果 text-to-motion motion synthesis motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
27 HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos HOT3D:提出首个基于多视角头戴视频的3D手部与物体跟踪数据集 MANO egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页