cs.CV(2025-03-05)

📊 共 29 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (6) 支柱八:物理动画 (Physics-based Animation) (3 🔗1) 支柱一:机器人控制 (Robot Control) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (2) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
1 E$^2$AT: Multimodal Jailbreak Defense via Dynamic Joint Optimization for Multimodal Large Language Models 提出E$^2$AT框架,通过动态联合优化提升多模态大语言模型对抗恶意攻击的鲁棒性。 large language model multimodal
2 DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms 构建DongbaMIE数据集,用于评估东巴象形文字语义理解的多模态信息抽取 large language model multimodal
3 Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations SabER:面向视觉-语言大模型的任务感知多模态上下文学习 multimodal
4 DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles DoraCycle:提出一种多模态循环的领域自适应统一生成模型,利用非配对数据实现模型进化。 multimodal
5 Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks 提出一种基于多模态图神经网络的矿物分割方法,融合电镜图像和光谱数据。 multimodal
6 See What You Are Told: Visual Attention Sink in Large Multimodal Models 揭示大模型视觉注意力陷阱,提出免训练的视觉注意力重分配方法 multimodal
7 BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation BEVMOSNet:融合相机、激光雷达和雷达数据,实现BEV视角下移动物体分割 multimodal
8 Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection 提出Phys-AD数据集,用于物理知识驱动的工业异常检测视觉判别与推理。 foundation model
9 Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters 提出BALViT,利用2D-3D Vision Transformer适配器实现LiDAR语义分割的标签高效学习。 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
10 NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics 提出NTR-Gaussian以解决夜间动态热重建问题 gaussian splatting splatting
11 Task-Agnostic Attacks Against Vision Foundation Models 提出任务无关对抗攻击,评估视觉基础模型在多下游任务中的安全性 depth estimation foundation model
12 Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames 提出基于多视角RGB图像的主动6D位姿估计方法,解决无纹理物体位姿估计难题。 6D pose estimation
13 BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation BAT:利用双向自适应时间相关性学习事件相机光流 optical flow
14 DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance DualDiff:基于奖励引导的双分支扩散模型,用于高保真驾驶场景视频生成 scene reconstruction multimodal
15 Improving 6D Object Pose Estimation of metallic Household and Industry Objects 针对金属物体,提出改进的6D位姿估计算法与数据集 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (6 篇)

#题目一句话要点标签🔗
16 JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba 提出JamMa:一种基于联合Mamba的超轻量级局部特征匹配方法 Mamba feature matching
17 Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings 提出方差感知损失调度方法,提升低数据量下多模态对齐效果 contrastive learning multimodal
18 Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations 提出基于课程学习和数据增强的越南语VQA框架,提升低资源场景性能。 curriculum learning multimodal
19 Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks 提出基于时序分离和熵正则化的知识蒸馏方法,提升脉冲神经网络性能。 distillation spatiotemporal
20 Self-Supervised Z-Slice Augmentation for 3D Bio-Imaging via Knowledge Distillation ZAugNet:基于自监督知识蒸馏的三维生物图像Z轴分辨率增强方法 distillation
21 Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization 提出一种基于知识蒸馏和混合量化的轻量级可学习图像压缩FPGA部署方案 distillation

🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)

#题目一句话要点标签🔗
22 DA-STGCN: 4D Trajectory Prediction Based on Spatiotemporal Feature Extraction 提出DA-STGCN,通过时空特征提取进行4D飞行轨迹预测,提升空中交通管理。 spatiotemporal
23 LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant 提出LION-FS,一种快速&慢速视频语言模型,用于在线视频助手,提升效率与效果。 spatiotemporal multimodal
24 Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis 提出动态神经表面D-SNS,用于弹性4D形状表示与分析,无需离散化。 spatiotemporal

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
25 Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation Afford-X:面向任务操作的通用且轻量级的可供性推理模型 manipulation affordance large language model
26 Combined Physics and Event Camera Simulator for Slip Detection 提出结合物理引擎与事件相机的滑移检测仿真pipeline,用于机器人操作 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
27 StickMotion: Generating 3D Human Motions by Drawing a Stickman StickMotion:通过简笔画生成3D人体动作,实现全局和局部运动控制 text-to-motion motion generation
28 Mocap-2-to-3: Multi-view Lifting for Monocular Motion Recovery with 2D Pretraining Mocap-2-to-3:利用2D预训练的多视角提升进行单目运动恢复 motion generation

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
29 EgoLife: Towards Egocentric Life Assistant EgoLife:构建基于可穿戴AI眼镜的以自我为中心的生活助手 egocentric multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页