cs.CV(2024-12-06)

📊 共 27 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (7 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱一:机器人控制 (Robot Control) (3) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling InternVL 2.5:通过模型、数据和测试时扩展,突破开源多模态模型性能边界 large language model multimodal visual grounding
2 CompCap: Improving Multimodal Large Language Models with Composite Captions 提出CompCap框架,提升多模态大语言模型对复合图像的理解能力 large language model multimodal
3 Verb Mirage: Unveiling and Assessing Verb Concept Hallucinations in Multimodal Large Language Models 首次揭示并评估多模态大语言模型中动词概念幻觉现象,并提出知识增强的缓解方法。 large language model multimodal
4 LinVT: Empower Your Image-level Large Language Model to Understand Videos 提出LinVT,赋能图像级大语言模型理解视频内容 large language model
5 LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation LoRA.rar:通过超网络学习LoRA融合,实现主题-风格条件图像生成 large language model multimodal
6 Gla-AI4BioMed at RRG24: Visual Instruction-tuned Adaptation for Radiology Report Generation 提出基于视觉指令微调的放射报告生成模型,提升胸部X光影像理解与报告生成能力。 large language model multimodal
7 Text to Blind Motion 提出BlindWays数据集,用于提升3D运动模型对盲人运动行为的预测能力 multimodal
8 Sparse autoencoders reveal selective remapping of visual concepts during adaptation 提出PatchSAE,揭示视觉概念在模型适应过程中的选择性重映射机制 foundation model
9 SMIC: Semantic Multi-Item Compression based on CLIP dictionary 提出基于CLIP字典的语义多项目压缩方法,提升图像集合压缩率。 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
10 SoPo: Text-to-Motion Generation Using Semi-Online Preference Optimization 提出SoPo:一种半在线偏好优化的文本到动作生成方法 DPO MDM text-to-motion
11 Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction 提出动量高斯自蒸馏方法,用于高质量大规模场景重建。 distillation 3D gaussian splatting gaussian splatting
12 EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation EACO:通过关键观察增强多模态LLM的对齐能力 DPO direct preference optimization large language model
13 PanoDreamer: Optimization-Based Single Image to 360 3D Scene With Diffusion PanoDreamer:基于扩散模型的单图到360°三维场景优化方法 dreamer depth estimation scene reconstruction
14 Birth and Death of a Rose 利用预训练2D扩散模型,生成随时间演变的物体内在属性,如玫瑰花开。 distillation foundation model
15 Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection 提出类感知对比学习,解决多类异常检测中的类间混淆问题 contrastive learning
16 SimC3D: A Simple Contrastive 3D Pretraining Framework Using RGB Images SimC3D:提出一种基于RGB图像的简单对比3D预训练框架,提升下游任务性能。 contrastive learning depth estimation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
17 Pushing Rendering Boundaries: Hard Gaussian Splatting 提出Hard Gaussian Splatting,解决3DGS中伪影问题,提升新视角合成质量。 3D gaussian splatting 3DGS gaussian splatting
18 MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussians 提出MixedGaussianAvatar,通过混合2D-3D高斯实现逼真且几何精确的头部Avatar重建 3D gaussian splatting 3DGS gaussian splatting
19 $S^3$: Synonymous Semantic Space for Improving Zero-Shot Generalization of Vision-Language Models 提出同义语义空间($S^3$),提升视觉-语言模型零样本泛化能力 open-vocabulary open vocabulary large language model
20 Extrapolated Urban View Synthesis Benchmark 提出EUVS基准,用于评估城市场景下外推视角合成算法的泛化能力。 3D gaussian splatting gaussian splatting splatting
21 Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories Perturb-and-Revise:基于生成轨迹的灵活NeRF 3D编辑方法 NeRF
22 Spatially-Adaptive Hash Encodings For Neural Surface Reconstruction 提出空间自适应哈希编码,用于神经表面重建,实现更高精度几何恢复。 scene reconstruction

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
23 BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects BimArt:一种用于合成3D双手与铰接物体交互的统一方法 manipulation bi-manual
24 DreamColour: Controllable Video Colour Editing without Training DreamColour:提出一种免训练的可控视频色彩编辑框架,提升编辑质量与效率。 manipulation
25 How to Squeeze An Explanation Out of Your Model 提出基于SE模块的模型无关可解释性方法,适用于图像和视频/多模态生物特征识别。 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
26 GS-Matching: Reconsidering Feature Matching task in Point Cloud Registration 提出GS-Matching策略,解决点云配准中特征匹配的非最优问题 feature matching

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
27 CigTime: Corrective Instruction Generation Through Inverse Motion Editing CigTime:通过逆运动编辑生成纠正性指令,用于运动技能学习。 motion generation large language model

⬅️ 返回 cs.CV 首页 · 🏠 返回主页