cs.CV(2024-12-21)

📊 共 19 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (6 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (4 🔗1) 支柱九:具身大模型 (Embodied Foundation Models) (3) 支柱四:生成式动作 (Generative Motion) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱一:机器人控制 (Robot Control) (2)

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
1 From Pixels to Gigapixels: Bridging Local Inductive Bias and Long-Range Dependencies with Pixel-Mamba Pixel-Mamba:利用像素级Mamba模型高效处理千兆像素病理切片,无需预训练。 Mamba SSM representation learning
2 V"Mean"ba: Visual State Space Models only need 1 hidden dimension VMeanba:通过通道均值化压缩视觉状态空间模型,加速图像处理。 SSM state space model
3 Enhancing Contrastive Learning Inspired by the Philosophy of "The Blind Men and the Elephant" 受“盲人摸象”启发,提出JointCrop和JointBlur增强对比学习 representation learning contrastive learning
4 Trusted Mamba Contrastive Network for Multi-View Clustering 提出可信Mamba对比网络(TMCN)用于解决多视图聚类中的不可信融合问题。 Mamba contrastive learning
5 Leveraging Contrastive Learning for Semantic Segmentation with Consistent Labels Across Varying Appearances 提出一种基于对比学习的语义分割方法,利用多变外观下的一致标签。 contrastive learning
6 Cross-View Consistency Regularisation for Knowledge Distillation 提出基于跨视角一致性正则化的知识蒸馏方法,提升logit蒸馏性能。 distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (4 篇)

#题目一句话要点标签🔗
7 OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities OmniSplat:用于全景图像的可编辑前馈3D高斯溅射框架 3D gaussian splatting 3DGS gaussian splatting
8 Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity 提出拓扑感知3D高斯溅射,优化场景结构完整性 3D gaussian splatting gaussian splatting splatting
9 LUCES-MV: A Multi-View Dataset for Near-Field Point Light Source Photometric Stereo LUCES-MV:用于近场点光源光度立体的多视角数据集 NeRF
10 Query Quantized Neural SLAM 提出查询量化神经SLAM,加速单帧过拟合,提升重建与跟踪精度。 implicit representation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)

#题目一句话要点标签🔗
11 SilVar: Speech Driven Multimodal Model for Reasoning Visual Question Answering and Object Localization 提出SilVar:语音驱动的多模态模型,用于视觉问答推理和目标定位 multimodal
12 LLaVA-SLT: Visual Language Tuning for Sign Language Translation 提出LLaVA-SLT,利用视觉语言微调提升无gloss标注的手语翻译性能。 large language model multimodal
13 REO-VLM: Transforming VLM to Meet Regression Challenges in Earth Observation 提出REO-VLM,解决遥感领域VLM在回归任务中的应用难题 multimodal

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
14 Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer 提出基于潜在扩散Transformer的统一框架,解决多人交互运动生成难题。 motion generation character animation
15 SemTalk: Holistic Co-speech Motion Generation with Frame-level Semantic Emphasis SemTalk:提出一种帧级别语义强调的整体口语动作生成方法 motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
16 Interact with me: Joint Egocentric Forecasting of Intent to Interact, Attitude and Social Actions 提出SocialEgoNet,用于从第一视角联合预测交互意图、态度和社会行为 egocentric spatiotemporal
17 Context-Aware Outlier Rejection for Robust Multi-View 3D Tracking of Similar Small Birds in An Outdoor Aviary 提出上下文感知异常值剔除方法,实现户外环境下相似小鸟的鲁棒多视角3D跟踪。 feature matching

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
18 Generalizable Articulated Object Perception with Superpoints 提出基于超点的通用可泛化关节物体感知方法,提升部件分割精度。 manipulation foundation model
19 TrojFlow: Flow Models are Natural Targets for Trojan Attacks 提出TrojFlow,揭示Flow模型易受特洛伊攻击的脆弱性 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页