cs.CV（2025-04-29）

📊 共 19 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱四：生成式动作 (Generative Motion) (3) 支柱三：空间感知与语义 (Perception & Semantics) (2) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱五：交互与反应 (Interaction & Reaction) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Plant Disease Detection through Multimodal Large Language Models and Convolutional Neural Networks	结合多模态LLM与CNN，实现高精度植物叶片病害自动检测	large language model multimodal
2	X-Fusion: Introducing New Modality to Frozen Large Language Models	X-Fusion：为冻结的大语言模型引入新模态，提升多模态任务性能	large language model multimodal
3	FedMVP: Federated Multimodal Visual Prompt Tuning for Vision-Language Models	FedMVP：联邦多模态视觉提示调优，提升视觉-语言模型泛化性	multimodal	✅
4	CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation	提出CMT框架以解决多模态CAD生成问题	multimodal
5	LMME3DHF: Benchmarking and Evaluating Multimodal 3D Human Face Generation with LMMs	提出LMME3DHF，基于LMM评估AI生成3D人脸质量与真实性，并构建大规模基准Gen3DHF。	multimodal
6	LymphAtlas- A Unified Multimodal Lymphoma Imaging Repository Delivering AI-Enhanced Diagnostic Insight	构建LymphAtlas淋巴瘤多模态影像数据集，实现AI增强的诊断洞察	multimodal	✅
7	Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers	提出Classifier-to-Bias (C2B)，实现视觉分类器无监督自动偏见检测。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Large-scale visual SLAM for in-the-wild videos	提出一种鲁棒的视觉SLAM系统，用于重建非结构化场景下的在线视频。	predictive model visual odometry visual SLAM
9	MambaMoE: Mixture-of-Spectral-Spatial-Experts State Space Model for Hyperspectral Image Classification	提出MambaMoE，利用混合专家模型进行高光谱图像分类，提升精度和效率。	Mamba state space model HSI	✅
10	SAM-Guided Robust Representation Learning for One-Shot 3D Medical Image Segmentation	提出RRL-MedSAM框架，利用SAM提升单样本3D医学图像分割性能	representation learning distillation foundation model
11	DS_FusionNet: Dynamic Dual-Stream Fusion with Bidirectional Knowledge Distillation for Plant Disease Recognition	DS_FusionNet：动态双流融合与双向知识蒸馏用于植物病害识别	distillation

🔬 支柱四：生成式动作 (Generative Motion) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
12	AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation	AlignDiT：多模态对齐扩散Transformer用于同步语音生成	classifier-free guidance multimodal
13	Efficient Listener: Dyadic Facial Motion Synthesis via Action Diffusion	提出基于扩散模型的面部动作生成方法，高效合成对话场景中听者的面部表情。	motion synthesis
14	Floating Car Observers in Intelligent Transportation Systems: Detection Modeling and Temporal Insights	提出基于浮动车观测器的智能交通系统车辆检测建模与时序分析方法	penetration

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
15	GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion	GaussTrap：针对3D高斯溅射的隐蔽投毒攻击，实现定向场景混淆	3D gaussian splatting 3DGS gaussian splatting
16	EfficientHuman: Efficient Training and Reconstruction of Moving Human using Articulated 2D Gaussian	EfficientHuman：利用可变形2D高斯快速训练和重建运动人体	3D gaussian splatting 3DGS gaussian splatting

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	MemeBLIP2: A novel lightweight multimodal system to detect harmful memes	提出MemeBLIP2轻量级多模态系统，用于检测有害Meme内容	HuMoR multimodal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	TesserAct: Learning 4D Embodied World Models	TesserAct：学习具身智能体的4D世界模型，实现时空一致的场景预测。	manipulation policy learning world model

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	PartHOI: Part-based Hand-Object Interaction Transfer via Generalized Cylinders	PartHOI：利用广义柱体实现基于部件的手-物交互迁移	HOI

⬅️ 返回 cs.CV 首页 · 🏠 返回主页