cs.CV(2024-12-13)

📊 共 39 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (17 🔗5) 支柱二:RL算法与架构 (RL & Architecture) (8) 支柱三:空间感知与语义 (Perception & Semantics) (8 🔗1) 支柱四:生成式动作 (Generative Motion) (4 🔗2) 支柱八:物理动画 (Physics-based Animation) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (17 篇)

#题目一句话要点标签🔗
1 Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation 提出Simignore以提升多模态大语言模型的复杂推理能力 large language model multimodal chain-of-thought
2 EVLM: Self-Reflective Multimodal Reasoning for Cross-Dimensional Visual Editing EVLM:通过自反思多模态推理实现跨维度视觉编辑 multimodal chain-of-thought
3 DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding DeepSeek-VL2:面向高级多模态理解的混合专家视觉语言模型 multimodal visual grounding
4 DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts DEFAME:提出基于动态证据和多模态专家的事实核查框架,显著提升文本图像混合场景下的核查性能。 multimodal
5 Apollo: An Exploration of Video Understanding in Large Multimodal Models Apollo:探索大规模多模态模型中的视频理解能力,并提出高效训练策略。 multimodal
6 Robust image classification with multi-modal large language models 提出MultiShield,利用多模态大语言模型提升图像分类模型对抗攻击的鲁棒性。 large language model
7 CognitionCapturer: Decoding Visual Stimuli From Human EEG Signal With Multimodal Information CognitionCapturer:利用多模态信息从人脑EEG信号中解码视觉刺激 multimodal
8 Learning Complex Non-Rigid Image Edits from Multimodal Conditioning 提出基于多模态条件控制的图像编辑方法,实现人物插入和姿态编辑。 multimodal
9 A multimodal dataset for understanding the impact of mobile phones on remote online virtual education IMPROVE:一个用于理解手机使用对远程在线教育影响的多模态数据集 multimodal
10 B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens 提出B-VLLM以解决长视频理解中的视觉令牌数量问题 large language model
11 All-in-One: Transferring Vision Foundation Models into Stereo Matching AIO-Stereo:将视觉基础模型迁移至立体匹配,实现性能突破 foundation model
12 Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining Iris:通过自适应聚焦和自精炼打破GUI复杂性的视觉Agent large language model multimodal
13 BrushEdit: All-In-One Image Inpainting and Editing 提出BrushEdit,一种基于图像修复的交互式指令图像编辑框架 large language model multimodal
14 Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics 提出UniSim-Bench多模态感知指标评测基准,并探索统一的多模态感知模型。 multimodal
15 Single-Pass Object-Focused Data Selection 提出对象聚焦数据选择方法以优化标注预算 foundation model
16 Dynamic Cross-Modal Alignment for Robust Semantic Location Prediction 提出CoVLA框架,解决多模态社交媒体语义位置预测中的歧义与差异问题 multimodal
17 CP-DETR: Concept Prompt Guide DETR Toward Stronger Universal Object Detection CP-DETR:通过概念提示引导DETR实现更强大的通用目标检测 foundation model

🔬 支柱二:RL算法与架构 (RL & Architecture) (8 篇)

#题目一句话要点标签🔗
18 SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians SuperGSeg:利用结构化超高斯实现开放词汇3D分割 distillation 3D gaussian splatting gaussian splatting
19 MulSMo: Multimodal Stylized Motion Generation by Bidirectional Control Flow MulSMo:通过双向控制流实现多模态风格化运动生成。 contrastive learning motion generation multimodal
20 Predictive Modeling, Pattern Recognition, and Spatiotemporal Representations of Plant Growth in Simulated and Controlled Environments: A Comprehensive Review 综述植物生长时空建模方法,聚焦预测、模式识别与环境交互。 predictive model spatiotemporal
21 XYScanNet: A State Space Model for Single Image Deblurring 提出XYScanNet,利用状态空间模型和切片扫描策略进行单图像去模糊。 Mamba SSM state space model
22 A dual contrastive framework AlignCap:提出双重对比框架,增强区域级视觉理解能力,提升区域描述性能。 contrastive learning multimodal
23 Towards Consistent and Efficient Dataset Distillation via Diffusion-Driven Selection 提出基于扩散驱动选择的数据集蒸馏方法,提升一致性和效率。 distillation
24 Going Beyond Feature Similarity: Effective Dataset Distillation based on Class-Aware Conditional Mutual Information 提出基于类感知条件互信息的数据集精馏方法,提升训练效率和性能。 distillation
25 Selective State Space Memory for Large Vision-Language Models 提出SSMI,通过选择性状态空间记忆高效微调大型视觉语言模型 Mamba multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
26 NeRF-Texture: Synthesizing Neural Radiance Field Textures 提出NeRF-Texture,通过神经辐射场合成具有三维中尺度结构的纹理。 NeRF neural radiance field implicit representation
27 Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation 提出Prompt引导的掩码提议网络PMP,用于两阶段开放词汇分割。 open-vocabulary open vocabulary
28 TSGaussian: Semantic and Depth-Guided Target-Specific Gaussian Splatting from Sparse Views TSGaussian:结合语义与深度先验的稀疏视角目标特定高斯溅射重建 gaussian splatting splatting
29 SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video 提出SplineGS,通过运动自适应样条曲线实现单目视频实时动态3D高斯重建。 3D gaussian splatting 3DGS gaussian splatting
30 GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion 提出GAF,利用多视角扩散模型从单目视频重建高逼真度可动画3D高斯头像 gaussian splatting splatting
31 Sharpening Your Density Fields: Spiking Neuron Aided Fast Geometry Learning 提出基于脉冲神经元的NeRF快速几何学习方法,提升几何提取效率。 NeRF neural radiance field
32 Coherent 3D Scene Diffusion From a Single RGB Image 提出一种基于扩散模型的单RGB图像三维场景连贯重建方法 scene reconstruction
33 A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization 提出可微波光学模型,用于端到端计算成像系统优化 scene reconstruction

🔬 支柱四:生成式动作 (Generative Motion) (4 篇)

#题目一句话要点标签🔗
34 EP-CFG: Energy-Preserving Classifier-Free Guidance 提出EP-CFG,通过能量保持解决扩散模型中CFG过饱和和过对比问题 classifier-free guidance
35 The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion 提出统一 verbal 和 non-verbal 语言的 3D 人体运动生成框架 motion generation multimodal
36 VQTalker: Towards Multilingual Talking Avatars through Facial Motion Tokenization VQTalker:通过面部运动 Token 化实现多语种口型生成 motion generation motion tokenizer
37 Quaffure: Real-Time Quasi-Static Neural Hair Simulation Quaffure:实时准静态神经头发模拟,提升虚拟形象真实感 physically plausible

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
38 Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP 提出STDD框架,利用时空动态专家知识图谱提升CLIP在零样本行为识别中的性能 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
39 SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers SVGBuilder:基于文本引导的自回归Transformer的组件化彩色SVG生成 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页