cs.CV(2024-10-28)

📊 共 20 篇论文 | 🔗 9 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱二:RL算法与架构 (RL & Architecture) (3 🔗2) 支柱四:生成式动作 (Generative Motion) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
1 Going Beyond H&E and Oncology: How Do Histopathology Foundation Models Perform for Multi-stain IHC and Immunology? 评估组织病理学基础模型在多染色免疫组化和免疫学上的泛化能力 foundation model
2 Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines 提出Vision Search Assistant,增强视觉-语言模型作为多模态搜索引擎的能力 multimodal
3 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization 提出SocialGPT,利用LLM进行社会关系推理,并优化Prompt。 large language model foundation model
4 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior LARP:提出一种基于学习的自回归生成先验的视频Token化方法,提升视频生成质量。 large language model multimodal
5 Face-MLLM: A Large Face Perception Model 提出Face-MLLM:一个用于人脸感知的多模态大模型 large language model multimodal
6 Large Pre-Training Datasets Don't Always Guarantee Robustness after Fine-Tuning 微调后大预训练模型鲁棒性下降:提出ImageNet-RIB评估鲁棒性继承 foundation model
7 Improving Generalization in Visual Reasoning via Self-Ensemble 提出Self-Ensemble方法,无需训练提升视觉推理模型的泛化能力。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
8 ODGS: 3D Scene Reconstruction from Omnidirectional Images with 3D Gaussian Splattings ODGS:基于3D高斯溅射的360度全景图像三维场景重建 3D gaussian splatting gaussian splatting splatting
9 Grid4D: 4D Decomposed Hash Encoding for High-Fidelity Dynamic Gaussian Splatting Grid4D:用于高保真动态高斯溅射的4D分解哈希编码 gaussian splatting splatting
10 MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps MVSDet:利用高效平面扫描实现多视角室内3D目标检测 gaussian splatting splatting NeRF
11 OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup OmniSep:提出Query-Mixup的统一全模态声音分离框架,实现多模态查询下的声音提取。 open-vocabulary open vocabulary
12 EEG-Driven 3D Object Reconstruction with Style Consistency and Diffusion Prior 提出基于脑电信号与扩散先验的3D物体重建方法,提升风格一致性。 NeRF neural radiance field
13 Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context 利用动作层级结构和文本上下文增强动作识别 optical flow

🔬 支柱二:RL算法与架构 (RL & Architecture) (3 篇)

#题目一句话要点标签🔗
14 ECMamba: Consolidating Selective State Space Model with Retinex Guidance for Efficient Multiple Exposure Correction 提出基于Mamba和Retinex理论的ECMamba,高效校正多重曝光图像 Mamba state space model
15 CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians CompGS:利用动态优化的3D高斯,释放2D可组合性,实现可组合的文本到3D生成。 distillation 3D gaussian splatting gaussian splatting
16 Exploring contextual modeling with linear complexity for point cloud segmentation 提出MEEPO,结合CNN与Mamba,高效提升点云分割性能。 Mamba SSM spatial relationship

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
17 Skinned Motion Retargeting with Dense Geometric Interaction Perception 提出MeshRet,通过建模密集几何交互实现高质量蒙皮角色动作重定向 penetration motion retargeting
18 Constrained Transformer-Based Porous Media Generation to Spatial Distribution of Rock Properties 提出基于约束Transformer的多孔介质生成方法,用于模拟岩石属性的空间分布 VQ-VAE spatial relationship

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
19 Bidirectional Recurrence for Cardiac Motion Tracking with Gaussian Process Latent Coding GPTrack:利用高斯过程潜在编码的双向递归心脏运动追踪框架 motion tracking

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
20 Synthetica: Large Scale Synthetic Data for Robot Perception Synthetica:大规模合成数据用于机器人感知,实现快速鲁棒的目标检测。 sim-to-real

⬅️ 返回 cs.CV 首页 · 🏠 返回主页