cs.CV（2025-03-05）

📊 共 29 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (9 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (6) 支柱八：物理动画 (Physics-based Animation) (3 🔗1) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (2) 支柱六：视频提取与匹配 (Video Extraction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	E$^2$AT: Multimodal Jailbreak Defense via Dynamic Joint Optimization for Multimodal Large Language Models	提出E$^2$AT框架，通过动态联合优化提升多模态大语言模型对抗恶意攻击的鲁棒性。	large language model multimodal
2	DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms	构建DongbaMIE数据集，用于评估东巴象形文字语义理解的多模态信息抽取	large language model multimodal
3	Advancing Multimodal In-Context Learning in Large Vision-Language Models with Task-aware Demonstrations	SabER：面向视觉-语言大模型的任务感知多模态上下文学习	multimodal
4	DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles	DoraCycle：提出一种多模态循环的领域自适应统一生成模型，利用非配对数据实现模型进化。	multimodal	✅
5	Mineral segmentation using electron microscope images and spectral sampling through multimodal graph neural networks	提出一种基于多模态图神经网络的矿物分割方法，融合电镜图像和光谱数据。	multimodal
6	See What You Are Told: Visual Attention Sink in Large Multimodal Models	揭示大模型视觉注意力陷阱，提出免训练的视觉注意力重分配方法	multimodal
7	BEVMOSNet: Multimodal Fusion for BEV Moving Object Segmentation	BEVMOSNet：融合相机、激光雷达和雷达数据，实现BEV视角下移动物体分割	multimodal
8	Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection	提出Phys-AD数据集，用于物理知识驱动的工业异常检测视觉判别与推理。	foundation model	✅
9	Label-Efficient LiDAR Semantic Segmentation with 2D-3D Vision Transformer Adapters	提出BALViT，利用2D-3D Vision Transformer适配器实现LiDAR语义分割的标签高效学习。	foundation model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
10	NTR-Gaussian: Nighttime Dynamic Thermal Reconstruction with 4D Gaussian Splatting Based on Thermodynamics	提出NTR-Gaussian以解决夜间动态热重建问题	gaussian splatting splatting
11	Task-Agnostic Attacks Against Vision Foundation Models	提出任务无关对抗攻击，评估视觉基础模型在多下游任务中的安全性	depth estimation foundation model
12	Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames	提出基于多视角RGB图像的主动6D位姿估计方法，解决无纹理物体位姿估计难题。	6D pose estimation	✅
13	BAT: Learning Event-based Optical Flow with Bidirectional Adaptive Temporal Correlation	BAT：利用双向自适应时间相关性学习事件相机光流	optical flow	✅
14	DualDiff+: Dual-Branch Diffusion for High-Fidelity Video Generation with Reward Guidance	DualDiff：基于奖励引导的双分支扩散模型，用于高保真驾驶场景视频生成	scene reconstruction multimodal	✅
15	Improving 6D Object Pose Estimation of metallic Household and Industry Objects	针对金属物体，提出改进的6D位姿估计算法与数据集	scene understanding

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
16	JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba	提出JamMa：一种基于联合Mamba的超轻量级局部特征匹配方法	Mamba feature matching
17	Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings	提出方差感知损失调度方法，提升低数据量下多模态对齐效果	contrastive learning multimodal
18	Enhancing Vietnamese VQA through Curriculum Learning on Raw and Augmented Text Representations	提出基于课程学习和数据增强的越南语VQA框架，提升低资源场景性能。	curriculum learning multimodal
19	Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks	提出基于时序分离和熵正则化的知识蒸馏方法，提升脉冲神经网络性能。	distillation spatiotemporal
20	Self-Supervised Z-Slice Augmentation for 3D Bio-Imaging via Knowledge Distillation	ZAugNet：基于自监督知识蒸馏的三维生物图像Z轴分辨率增强方法	distillation
21	Lightweight Embedded FPGA Deployment of Learned Image Compression with Knowledge Distillation and Hybrid Quantization	提出一种基于知识蒸馏和混合量化的轻量级可学习图像压缩FPGA部署方案	distillation

🔬 支柱八：物理动画 (Physics-based Animation) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
22	DA-STGCN: 4D Trajectory Prediction Based on Spatiotemporal Feature Extraction	提出DA-STGCN，通过时空特征提取进行4D飞行轨迹预测，提升空中交通管理。	spatiotemporal
23	LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant	提出LION-FS，一种快速&慢速视频语言模型，用于在线视频助手，提升效率与效果。	spatiotemporal multimodal
24	Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis	提出动态神经表面D-SNS，用于弹性4D形状表示与分析，无需离散化。	spatiotemporal	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
25	Afford-X: Generalizable and Slim Affordance Reasoning for Task-oriented Manipulation	Afford-X：面向任务操作的通用且轻量级的可供性推理模型	manipulation affordance large language model
26	Combined Physics and Event Camera Simulator for Slip Detection	提出结合物理引擎与事件相机的滑移检测仿真pipeline，用于机器人操作	manipulation	✅

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
27	StickMotion: Generating 3D Human Motions by Drawing a Stickman	StickMotion：通过简笔画生成3D人体动作，实现全局和局部运动控制	text-to-motion motion generation
28	Mocap-2-to-3: Multi-view Lifting for Monocular Motion Recovery with 2D Pretraining	Mocap-2-to-3：利用2D预训练的多视角提升进行单目运动恢复	motion generation

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
29	EgoLife: Towards Egocentric Life Assistant	EgoLife：构建基于可穿戴AI眼镜的以自我为中心的生活助手	egocentric multimodal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页