cs.CV(2024-12-21)
📊 共 19 篇论文 | 🔗 3 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (6 🔗1)
支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1)
支柱九:具身大模型 (Embodied Foundation Models) (3)
支柱四:生成式动作 (Generative Motion) (2)
支柱六:视频提取与匹配 (Video Extraction) (2 🔗1)
支柱一:机器人控制 (Robot Control) (2)
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | From Pixels to Gigapixels: Bridging Local Inductive Bias and Long-Range Dependencies with Pixel-Mamba | Pixel-Mamba:利用像素级Mamba模型高效处理千兆像素病理切片,无需预训练。 | Mamba SSM representation learning | ||
| 2 | V"Mean"ba: Visual State Space Models only need 1 hidden dimension | VMeanba:通过通道均值化压缩视觉状态空间模型,加速图像处理。 | SSM state space model | ||
| 3 | Enhancing Contrastive Learning Inspired by the Philosophy of "The Blind Men and the Elephant" | 受“盲人摸象”启发,提出JointCrop和JointBlur增强对比学习 | representation learning contrastive learning | ||
| 4 | Trusted Mamba Contrastive Network for Multi-View Clustering | 提出可信Mamba对比网络(TMCN)用于解决多视图聚类中的不可信融合问题。 | Mamba contrastive learning | ✅ | |
| 5 | Leveraging Contrastive Learning for Semantic Segmentation with Consistent Labels Across Varying Appearances | 提出一种基于对比学习的语义分割方法,利用多变外观下的一致标签。 | contrastive learning | ||
| 6 | Cross-View Consistency Regularisation for Knowledge Distillation | 提出基于跨视角一致性正则化的知识蒸馏方法,提升logit蒸馏性能。 | distillation |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 7 | OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities | OmniSplat:用于全景图像的可编辑前馈3D高斯溅射框架 | 3D gaussian splatting 3DGS gaussian splatting | ✅ | |
| 8 | Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity | 提出拓扑感知3D高斯溅射,优化场景结构完整性 | 3D gaussian splatting gaussian splatting splatting | ||
| 9 | LUCES-MV: A Multi-View Dataset for Near-Field Point Light Source Photometric Stereo | LUCES-MV:用于近场点光源光度立体的多视角数据集 | NeRF | ||
| 10 | Query Quantized Neural SLAM | 提出查询量化神经SLAM,加速单帧过拟合,提升重建与跟踪精度。 | implicit representation |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization | 提出SilVar:语音驱动的多模态模型,用于视觉问答推理和目标定位 | multimodal | ||
| 12 | LLaVA-SLT: Visual Language Tuning for Sign Language Translation | 提出LLaVA-SLT,利用视觉语言微调提升无gloss标注的手语翻译性能。 | large language model multimodal | ||
| 13 | REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation | 提出REO-VLM,解决遥感领域VLM在回归任务中的应用难题 | multimodal |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer | 提出基于潜在扩散Transformer的统一框架,解决多人交互运动生成难题。 | motion generation character animation | ||
| 15 | SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis | SemTalk:提出一种帧级别语义强调的整体口语动作生成方法 | motion generation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions | 提出SocialEgoNet,用于从第一视角联合预测交互意图、态度和社会行为 | egocentric spatiotemporal | ||
| 17 | Context-Aware Outlier Rejection for Robust Multi-View 3D Tracking of Similar Small Birds in An Outdoor Aviary | 提出上下文感知异常值剔除方法,实现户外环境下相似小鸟的鲁棒多视角3D跟踪。 | feature matching | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 18 | Generalizable Articulated Object Perception with Superpoints | 提出基于超点的通用可泛化关节物体感知方法,提升部件分割精度。 | manipulation foundation model | ||
| 19 | TrojFlow: Flow Models are Natural Targets for Trojan Attacks | 提出TrojFlow,揭示Flow模型易受特洛伊攻击的脆弱性 | manipulation |