cs.CV（2024-05-13）

📊 共 23 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (8 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (3) 支柱四：生成式动作 (Generative Motion) (2) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱八：物理动画 (Physics-based Animation) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	SignAvatar: Sign Language 3D Motion Reconstruction and Generation	SignAvatar：提出基于Transformer的框架，用于手语3D动作重建与生成。	curriculum learning motion reconstruction
2	FORESEE: Multimodal and Multi-view Representation Learning for Robust Prediction of Cancer Survival	FORESEE：用于癌症生存预测的多模态多视角表征学习框架	representation learning masked autoencoder multimodal
3	Motion Keyframe Interpolation for Any Human Skeleton via Temporally Consistent Point Cloud Sampling and Reconstruction	提出PC-MRL，通过时序一致点云采样与重建实现任意人体骨骼的运动关键帧插值。	representation learning human motion motion representation
4	OverlapMamba: Novel Shift State Space Model for LiDAR-based Place Recognition	提出OverlapMamba，一种基于LiDAR的新型移位状态空间模型，用于解决地点识别问题。	Mamba SSM state space model
5	MambaOut: Do We Really Need Mamba for Vision?	提出MambaOut模型，揭示Mamba在图像分类任务中的非必要性，并探索其在长序列视觉任务中的潜力。	Mamba SSM state space model	✅
6	GMSR:Gradient-Guided Mamba for Spectral Reconstruction from RGB Images	提出梯度引导的Mamba网络GMSR-Net，用于RGB图像光谱重建，实现精度与效率的平衡。	Mamba	✅
7	MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders	MonoMAE：通过深度感知掩码自编码器增强单目3D目标检测	masked autoencoder
8	Sakuga-42M Dataset: Scaling Up Cartoon Research	提出Sakuga-42M大规模卡通数据集，促进卡通视频理解与生成研究	Mamba foundation model

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
9	MedVersa: A Generalist Foundation Model for Medical Image Interpretation	MedVersa：用于医学图像解读的通用基础模型，性能媲美专家系统	foundation model multimodal
10	AnoVox: A Benchmark for Multimodal Anomaly Detection in Autonomous Driving	AnoVox：用于自动驾驶多模态异常检测的大规模基准数据集	multimodal
11	IHC Matters: Incorporating IHC analysis to H&E Whole Slide Image Analysis for Improved Cancer Grading via Two-stage Multimodal Bilinear Pooling Fusion	提出双阶段多模态双线性池化融合模型，利用IHC提升H&E图像癌症分级	multimodal
12	Improving Multimodal Learning with Multi-Loss Gradient Modulation	提出多损失梯度调制方法，提升多模态学习中模态融合效果。	multimodal
13	FreeVA: Offline MLLM as Training-Free Video Assistant	FreeVA：无需训练的离线MLLM视频助手，超越视频指令调优方法	large language model multimodal	✅
14	Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation	研究CLIP零样本异常分割的语义鲁棒性，揭示其在语义扰动下的性能下降	foundation model
15	Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?	提出基于LLM类描述的Prompt Tuning方法，提升VLM泛化能力	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
16	GaussianVTON: 3D Human Virtual Try-ON via Multi-Stage Gaussian Splatting Editing with Image Prompting	GaussianVTON：提出基于图像提示的多阶段高斯溅射编辑3D人体虚拟试穿方法	gaussian splatting splatting
17	Synergistic Integration of Coordinate Network and Tensorial Feature for Improving Neural Radiance Fields from Sparse Inputs	提出坐标网络与张量特征协同的NeRF方法，提升稀疏输入下的重建效果	NeRF neural radiance field
18	SceneFactory: A Workflow-centric and Unified Framework for Incremental Scene Modeling	SceneFactory：面向工作流的统一增量场景建模框架，支持多种重建任务。	depth estimation scene reconstruction

🔬 支柱四：生成式动作 (Generative Motion) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Establishing a Unified Evaluation Framework for Human Motion Generation: A Comparative Analysis of Metrics	提出人体运动生成统一评估框架，对比分析现有指标并引入时序扭曲多样性度量。	motion generation human motion human motion generation
20	Generating Human Motion in 3D Scenes from Text Descriptions	提出一种基于文本描述在3D场景中生成人机交互运动的方法	motion generation human-scene interaction human motion

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving	提出oTTC，扩展目标检测模型以估计自动驾驶中的目标碰撞时间	motion estimation

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection	提出一种语义和运动感知的时空Transformer网络用于动作检测	spatiotemporal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Boostlet.js: Image processing plugins for the web via JavaScript injection	Boostlet.js：通过JavaScript注入为Web提供图像处理插件	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页