cs.CV(2025-03-05)
📊 共 29 篇论文 | 🔗 7 篇有代码
🎯 兴趣领域导航
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (6 🔗3)
支柱二:RL算法与架构 (RL & Architecture) (6)
支柱八:物理动画 (Physics-based Animation) (3 🔗1)
支柱一:机器人控制 (Robot Control) (2 🔗1)
支柱四:生成式动作 (Generative Motion) (2)
支柱六:视频提取与匹配 (Video Extraction) (1)
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 10 | NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics | 提出NTR-Gaussian以解决夜间动态热重建问题 | gaussian splatting splatting | ||
| 11 | Task-Agnostic Attacks Against Vision Foundation Models | 提出任务无关对抗攻击,评估视觉基础模型在多下游任务中的安全性 | depth estimation foundation model | ||
| 12 | Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames | 提出基于多视角RGB图像的主动6D位姿估计方法,解决无纹理物体位姿估计难题。 | 6D pose estimation | ✅ | |
| 13 | BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation | BAT:利用双向自适应时间相关性学习事件相机光流 | optical flow | ✅ | |
| 14 | DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance | DualDiff:基于奖励引导的双分支扩散模型,用于高保真驾驶场景视频生成 | scene reconstruction multimodal | ✅ | |
| 15 | Improving 6D Object Pose Estimation of metallic Household and Industry Objects | 针对金属物体,提出改进的6D位姿估计算法与数据集 | scene understanding |
🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 16 | JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba | 提出JamMa:一种基于联合Mamba的超轻量级局部特征匹配方法 | Mamba feature matching | ||
| 17 | Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings | 提出方差感知损失调度方法,提升低数据量下多模态对齐效果 | contrastive learning multimodal | ||
| 18 | Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations | 提出基于课程学习和数据增强的越南语VQA框架,提升低资源场景性能。 | curriculum learning multimodal | ||
| 19 | Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks | 提出基于时序分离和熵正则化的知识蒸馏方法,提升脉冲神经网络性能。 | distillation spatiotemporal | ||
| 20 | Self-Supervised Z-Slice Augmentation for 3D Bio-Imaging via Knowledge Distillation | ZAugNet:基于自监督知识蒸馏的三维生物图像Z轴分辨率增强方法 | distillation | ||
| 21 | Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization | 提出一种基于知识蒸馏和混合量化的轻量级可学习图像压缩FPGA部署方案 | distillation |
🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 22 | DA-STGCN: 4D Trajectory Prediction Based on Spatiotemporal Feature Extraction | 提出DA-STGCN,通过时空特征提取进行4D飞行轨迹预测,提升空中交通管理。 | spatiotemporal | ||
| 23 | LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant | 提出LION-FS,一种快速&慢速视频语言模型,用于在线视频助手,提升效率与效果。 | spatiotemporal multimodal | ||
| 24 | Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis | 提出动态神经表面D-SNS,用于弹性4D形状表示与分析,无需离散化。 | spatiotemporal | ✅ |
🔬 支柱一:机器人控制 (Robot Control) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation | Afford-X:面向任务操作的通用且轻量级的可供性推理模型 | manipulation affordance large language model | ||
| 26 | Combined Physics and Event Camera Simulator for Slip Detection | 提出结合物理引擎与事件相机的滑移检测仿真pipeline,用于机器人操作 | manipulation | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 27 | StickMotion: Generating 3D Human Motions by Drawing a Stickman | StickMotion:通过简笔画生成3D人体动作,实现全局和局部运动控制 | text-to-motion motion generation | ||
| 28 | Mocap-2-to-3: Multi-view Lifting for Monocular Motion Recovery with 2D Pretraining | Mocap-2-to-3:利用2D预训练的多视角提升进行单目运动恢复 | motion generation |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 29 | EgoLife: Towards Egocentric Life Assistant | EgoLife:构建基于可穿戴AI眼镜的以自我为中心的生活助手 | egocentric multimodal |