cs.CV(2025-02-11)

📊 共 22 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (6 🔗2) 支柱六:视频提取与匹配 (Video Extraction) (5 🔗3) 支柱一:机器人控制 (Robot Control) (4) 支柱九:具身大模型 (Embodied Foundation Models) (4 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
1 Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors 提出Flow Distillation Sampling,利用预训练匹配先验正则化3D高斯模型,提升几何重建质量。 distillation 3D gaussian splatting 3DGS
2 A Survey on Mamba Architecture for Vision Applications 综述Mamba架构在视觉任务中的应用,探索其在图像和视频理解中的潜力。 Mamba SSM spatiotemporal
3 HOMIE: Histopathology Omni-modal Embedding for Pathology Composed Retrieval HOMIE:用于病理组合检索的组织病理学全模态嵌入方法 predictive model large language model multimodal
4 A Survey of Representation Learning, Optimization Strategies, and Applications for Omnidirectional Vision 全景视觉深度学习综述:聚焦表征学习、优化策略与应用 representation learning optical flow
5 PlaySlot: Learning Inverse Latent Dynamics for Controllable Object-Centric Video Prediction and Planning PlaySlot:学习逆向潜在动态,实现可控的、以对象为中心的视频预测与规划 world model latent dynamics
6 Articulate That Object Part (ATOP): 3D Part Articulation via Text and Motion Personalization ATOP:提出一种基于文本和运动个性化的3D部件可动性建模方法 distillation motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (5 篇)

#题目一句话要点标签🔗
7 Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models 提出Anomaly-OV,用于零样本异常检测与推理,显著提升细粒度异常识别能力。 feature matching large language model multimodal
8 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering 提出EgoTextVQA基准,用于评测以自我为中心的场景文本感知视频问答能力。 egocentric large language model multimodal
9 PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization 提出PRVQL,通过渐进式知识引导优化第一人称视频中的视觉查询定位。 egocentric Ego4D
10 EventEgo3D++: 3D Human Motion Capture from a Head-Mounted Event Camera EventEgo3D++:利用头戴式事件相机进行3D人体运动捕捉 SMPL egocentric
11 Few-Shot Multi-Human Neural Rendering Using Geometry Constraints 提出基于几何约束的少样本多人神经渲染方法,解决遮挡和杂乱问题。 SMPL

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
12 TranSplat: Surface Embedding-guided 3D Gaussian Splatting for Transparent Object Manipulation TranSplat:表面嵌入引导的3D高斯溅射用于透明物体操作 manipulation 3D gaussian splatting gaussian splatting
13 DeepSeek on a Trip: Inducing Targeted Visual Hallucinations via Representation Vulnerabilities 通过表征脆弱性诱导DeepSeek模型产生目标视觉幻觉 manipulation large language model multimodal
14 Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving 提出PreWorld:一种半监督的、以视觉为中心的3D Occupancy世界模型,用于自动驾驶。 motion planning world model
15 Diffusion Suction Grasping with Large-Scale Parcel Dataset 提出Diffusion-Suction,解决复杂包裹抓取的吸盘抓取规划问题 manipulation affordance grasp prediction

🔬 支柱九:具身大模型 (Embodied Foundation Models) (4 篇)

#题目一句话要点标签🔗
16 Towards a Robust Framework for Multimodal Hate Detection: A Study on Video vs. Image-based Content 针对多模态仇恨检测,提出一种鲁棒框架,着重研究视频与图像内容差异性。 multimodal
17 NanoVLMs: How small can we go and still make coherent Vision Language Models? 提出NanoVLMs,探索保持视觉语言模型连贯性的最小模型尺寸。 large language model multimodal
18 Scaling Pre-training to One Hundred Billion Data for Vision Language Models 大规模视觉语言预训练:探索千亿级数据对模型性能与文化多样性的影响 multimodal
19 Confidence-calibrated covariate shift correction for few-shot classification in Vision-Language Models 提出CalShift方法,校准置信度并修正协变量偏移,提升视觉-语言模型在少样本分类中的泛化性。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)

#题目一句话要点标签🔗
20 TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation 提出TRAVEL,一种免训练的视觉语言导航检索与对齐方法 semantic map VLMAP VLN
21 VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation VidCRAFT3:通过相机、物体和光照控制实现图像到视频的生成 optical flow

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
22 Enhancing Video Understanding: Deep Neural Networks for Spatiotemporal Analysis 探索时空特征与深度网络,综述视频理解算法与数据集 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页