cs.CV(2024-12-27)

📊 共 18 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱五:交互与反应 (Interaction & Reaction) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1 🔗1) 支柱二:RL算法与架构 (RL & Architecture) (1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios 提出MLLM-SUL框架,利用多模态大语言模型解决交通场景下的语义场景理解与风险定位问题。 scene understanding large language model multimodal
2 DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction DAS3R:提出动力学感知高斯溅射方法,用于静态场景重建 gaussian splatting splatting scene reconstruction
3 Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images 提出Dust to Tower以解决稀疏无标定图像的场景重建问题 3D gaussian splatting 3DGS gaussian splatting
4 Learning Radiance Fields from a Single Snapshot Compressive Image 提出SCINeRF和SCISplat,从单快照压缩图像中学习辐射场,实现高质量三维重建和快速渲染。 3D gaussian splatting 3DGS gaussian splatting
5 Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation 提出GSNet框架与LandDiscover50K数据集,实现遥感图像开放词汇语义分割 open-vocabulary open vocabulary
6 Sharpening Neural Implicit Functions with Frequency Consolidation Priors 提出频率整合先验以提升神经隐式函数的表现 implicit representation
7 Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization 提出基于广义不确定性的证据融合与混合多头注意力机制,解决弱监督时序动作定位中的动作-背景混淆问题。 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
8 CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs CAD-GPT:利用空间推理增强的多模态LLM合成CAD构建序列 large language model multimodal
9 Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models 分析视觉基础模型视角不稳定性,揭示其在3D推理任务中的泛化差距 foundation model
10 A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization 提出MMTT数据集和ForgeryTalker模型,用于可解释的面部伪造图像定位。 large language model multimodal
11 From Elements to Design: A Layered Approach for Automatic Graphic Design Composition 提出LaDeCo以解决自动图形设计组合问题 multimodal
12 MBQ: Modality-Balanced Quantization for Large Vision-Language Models 提出模态平衡量化(MBQ)方法,提升大视觉-语言模型量化后的精度。 large language model
13 Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints 提出时序上下文一致性方法,增强长时行为预测能力 large language model
14 MINIMA: Modality Invariant Image Matching MINIMA:提出模态不变图像匹配框架,解决跨模态图像匹配泛化性问题 multimodal

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
15 Interacted Object Grounding in Spatio-Temporal Human-Object Interactions 提出GIO基准和4D-QA框架,解决时空人-物交互中开放世界物体定位难题 human-object interaction HOI

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
16 MVTamperBench: Evaluating Robustness of Vision-Language Models MVTamperBench:评估视觉-语言模型对抗视频篡改的鲁棒性 manipulation large language model multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (1 篇)

#题目一句话要点标签🔗
17 Image Classification with Deep Reinforcement Active Learning 提出基于深度强化学习的主动学习方法以解决图像分类中的标注稀缺问题 reinforcement learning deep reinforcement learning

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
18 Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation 提出SSR-STF双流模型,优化局部-全局依赖关系,提升3D人体姿态估计精度 human mesh recovery

⬅️ 返回 cs.CV 首页 · 🏠 返回主页