cs.CV(2024-12-20)

📊 共 24 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (8 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (6 🔗3) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
1 CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images CoCoGaussian:利用弥散圆进行离焦图像的3D高斯溅射 3D gaussian splatting 3DGS gaussian splatting
2 EGSRAL: An Enhanced 3D Gaussian Splatting based Renderer with Automated Labeling for Large-Scale Driving Scene EGSRAL:增强的基于3D高斯溅射的大规模自动标注驾驶场景渲染器 3D gaussian splatting gaussian splatting splatting
3 IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing 提出IRGS,利用2D高斯光线追踪实现精确的3D高斯溅射体间反射渲染。 3DGS gaussian splatting splatting
4 NeRF-To-Real Tester: Neural Radiance Fields as Test Image Generators for Vision of Autonomous Systems 提出N2R-Tester,利用NeRF生成多样化测试图像,用于提升自主系统视觉组件的鲁棒性。 NeRF neural radiance field
5 DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment DINOv2.txt:统一图像和像素级视觉-语言对齐框架 open-vocabulary open vocabulary foundation model
6 Interactive Scene Authoring with Specialized Generative Primitives 提出基于生成原语的交互式场景创作框架,简化非专业用户3D场景设计。 3D gaussian splatting gaussian splatting splatting
7 MotiF: Making Text Count in Image Animation with Motion Focal Loss MotiF:通过运动焦点损失改进文本引导的图像动画生成 optical flow motion generation
8 Improving Object Detection for Time-Lapse Imagery Using Temporal Features in Wildlife Monitoring 利用时序特征改进延时图像中的目标检测,用于野生动物监测 scene understanding

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
9 MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Classification and Detection MiniGPT-Pancreas:用于胰腺癌分类和检测的多模态大型语言模型 large language model multimodal
10 J-EDI QA: Benchmark for deep-sea organism-specific multimodal LLM 提出J-EDI QA深海生物多模态LLM基准,评估模型在深海物种理解上的能力。 large language model multimodal
11 PruneVid: Visual Token Pruning for Efficient Video Large Language Models PruneVid:用于高效视频大语言模型的视觉Token剪枝 large language model
12 Precision ICU Resource Planning: A Multimodal Model for Brain Surgery Outcomes 提出基于多模态融合的脑外科手术结果预测模型,优化ICU资源分配。 multimodal
13 A High-Quality Text-Rich Image Instruction Tuning Dataset via Hybrid Instruction Generation LLaVAR-2:通过混合指令生成高质量富文本图像指令调优数据集 large language model multimodal
14 Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage 提出CapMAS:一种多智能体协作框架与双重评估指标,提升超细节图像描述的真实性和覆盖率 large language model multimodal
15 HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding HoVLE:通过整体视觉-语言嵌入释放单体视觉-语言模型的潜力 large language model
16 VORD: Visual Ordinal Calibration for Mitigating Object Hallucinations in Large Vision-Language Models VORD:通过视觉序数校准缓解大型视觉语言模型中的对象幻觉 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
17 LiRCDepth: Lightweight Radar-Camera Depth Estimation via Knowledge Distillation and Uncertainty Guidance 提出LiRCDepth:一种轻量级雷达相机深度估计模型,通过知识蒸馏和不确定性引导提升性能。 MAE distillation depth estimation
18 Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking 提出STTrack,利用多模态时空模式提升复杂场景下的视频目标跟踪性能。 Mamba multimodal
19 Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks Mamba2D:提出原生多维状态空间模型,用于视觉任务。 Mamba SSM
20 SaliencyI2PLoc: saliency-guided image-point cloud localization using contrastive learning SaliencyI2PLoc:利用显著性引导和对比学习实现图像-点云跨模态定位 contrastive learning
21 DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization 提出基于蒸馏和潜在奖励优化的DOLLAR框架,实现少步高质量视频生成。 distillation
22 3D Shape Tokenization via Latent Flow Matching 提出基于流匹配的3D形状token化方法,用于学习友好的3D表面概率密度表示。 flow matching

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
23 SCENIC: Scene-aware Semantic Navigation with Instruction-guided Control SCENIC:提出场景感知语义导航模型,实现指令引导下的逼真人体运动生成。 motion synthesis physically plausible

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
24 Can Generative Video Models Help Pose Estimation? 提出InterPose以解决图像间姿态估计问题 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页