cs.CV(2024-05-27)

📊 共 37 篇论文 | 🔗 12 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (13 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (9 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱七:动作重定向 (Motion Retargeting) (3 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (13 篇)

#题目一句话要点标签🔗
1 F-3DGS: Factorized Coordinates and Representations for 3D Gaussian Splatting 提出因子分解3D高斯溅射(F-3DGS),在保证图像质量的同时显著降低存储需求。 3D gaussian splatting 3DGS gaussian splatting
2 GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane GOI:利用可优化的语义空间超平面寻找3D高斯兴趣点,实现开放词汇场景理解。 3D gaussian splatting 3DGS gaussian splatting
3 DOF-GS:Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal 提出DOF-GS,通过可调景深3D高斯溅射实现拍摄后重聚焦、散焦渲染和去模糊 3D gaussian splatting 3DGS gaussian splatting
4 PyGS: Large-scale Scene Representation with Pyramidal 3D Gaussian Splatting 提出PyGS:基于金字塔3D高斯溅射的大规模场景表示方法 3D gaussian splatting gaussian splatting splatting
5 DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos DC-Gaussian:改进3D高斯溅射,用于反射严重的行车记录仪视频的新视角合成 3D gaussian splatting 3DGS gaussian splatting
6 SA-GS: Semantic-Aware Gaussian Splatting for Large Scene Reconstruction with Geometry Constrain 提出SA-GS:语义感知的高斯溅射用于几何约束的大场景重建 gaussian splatting splatting scene reconstruction
7 DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation DCPI-Depth:通过显式注入稠密对应先验提升无监督单目深度估计 depth estimation monocular depth metric depth
8 Consistency Regularisation for Unsupervised Domain Adaptation in Monocular Depth Estimation 提出基于一致性正则化的单目深度估计无监督领域自适应方法 depth estimation monocular depth
9 A Comparative Study on Multi-task Uncertainty Quantification in Semantic Segmentation and Monocular Depth Estimation 多任务学习不确定性量化:深度集成模型提升语义分割和单目深度估计 depth estimation monocular depth
10 MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds 提出MoSca,通过4D运动支架从单目视频重建动态场景并合成新视角。 gaussian splatting splatting
11 All-day Depth Completion 提出一种全天候深度补全方法,通过多传感器融合和不确定性引导的残差学习,提升弱光环境下的深度估计精度。 depth estimation
12 CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild CoCoGesture:提出一种在野外场景下生成连贯的语音驱动3D手势框架 MoGe
13 SCSim: A Realistic Spike Cameras Simulator SCSim:提出更真实的脉冲相机模拟器,解决现有数据集真实性不足问题 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
14 Estimating Depth of Monocular Panoramic Image with Teacher-Student Model Fusing Equirectangular and Spherical Representations 提出一种融合Equirectangular和Spherical表示的Teacher-Student模型,用于单目全景图像深度估计。 teacher-student depth estimation monocular depth
15 Memorize What Matters: Emergent Scene Decomposition from Multitraverse 提出基于3D高斯映射的自监督场景分解框架,用于机器人持久环境感知。 distillation 3D gaussian splatting gaussian splatting
16 ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection ContrastAlign:利用对比学习实现鲁棒的BEV特征对齐,提升多模态3D目标检测性能 contrastive learning depth estimation
17 UniCompress: Enhancing Multi-Data Medical Image Compression with Knowledge Distillation UniCompress:基于知识蒸馏的多数据医学图像压缩方法 distillation implicit representation
18 Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports 提出UniTraj统一轨迹生成模型,解决体育运动中多智能体轨迹预测、补全和时空恢复等问题。 Mamba SSM state space model
19 TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability 提出TIMA方法,平衡CLIP模型在零样本对抗鲁棒性和泛化能力 distillation foundation model
20 LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence 提出LARM:用于长时程具身智能的大型自回归模型,解决奖励消失问题。 reinforcement learning large language model
21 LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling 提出局部约束紧凑点云模型(LCM),提升Masked Point Modeling的效率与性能。 Mamba MAE
22 TokenUnify: Scaling Up Autoregressive Pretraining for Neuron Segmentation TokenUnify:通过自回归预训练扩展神经元分割能力 Mamba MAE

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
23 Matryoshka Multimodal Models 提出M3:Matryoshka多模态模型,通过嵌套视觉token实现视觉粒度可控和效率提升。 large language model multimodal
24 Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model Reason3D:利用大语言模型进行3D分割搜索与推理 large language model multimodal
25 TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing TIE:利用多模态LLM和CoT推理,革新复杂提示下的高保真图像编辑 large language model multimodal chain-of-thought
26 Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning 提出几何结构一致性学习方法,缓解多模态数据中噪声对应问题 multimodal
27 VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models 提出VoCoT框架,提升大模型在视觉推理任务中的多步推理能力 chain-of-thought
28 Multilingual Diversity Improves Vision-Language Representations 利用多语言数据增强视觉-语言表征,提升模型在英语视觉任务上的性能 multimodal
29 Concept Matching with Agent for Out-of-Distribution Detection 提出基于Agent的概念匹配方法CMA,提升OOD检测的鲁棒性和适应性 large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (3 篇)

#题目一句话要点标签🔗
30 LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding LLM-Optic:利用大语言模型实现通用视觉定位,无需额外训练。 spatial relationship large language model multimodal
31 Human4DiT: 360-degree Human Video Generation with 4D Diffusion Transformer 提出Human4DiT,利用4D扩散Transformer生成高质量360度人体视频 human motion
32 Controllable Longer Image Animation with Diffusion Models 提出基于扩散模型的图像动画方法,实现可控的更长时程视频生成。 motion prediction

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
33 A Self-Correcting Vision-Language-Action Model for Fast and Slow System Manipulation 提出自校正视觉-语言-动作模型,提升机器人操作的鲁棒性和准确性 manipulation policy learning vision-language-action
34 Sync4D: Video Guided Controllable Dynamics for Physics-Based 4D Generation Sync4D:视频引导的可控动力学物理4D生成 quadruped

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
35 Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs 提出Motion-Agent,利用LLM实现通用人运动生成、编辑与理解的对话式框架 motion generation human motion human motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
36 A Cross-Dataset Study for Text-based 3D Human Motion Retrieval 提出基于文本的3D人体动作检索跨数据集泛化研究,揭示数据集偏差。 SMPL motion retrieval human motion

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
37 BAISeg: Boundary Assisted Weakly Supervised Instance Segmentation BAISeg:提出边界辅助的弱监督实例分割方法,无需实例级标注。 mutual attention

⬅️ 返回 cs.CV 首页 · 🏠 返回主页