cs.CV（2024-07-26）

📊 共 15 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (5 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (3 🔗2) 支柱一：机器人控制 (Robot Control) (2 🔗1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Graph-based Unsupervised Disentangled Representation Learning via Multimodal Large Language Models	提出基于图和多模态大语言模型的无监督解耦表示学习框架，解决语义因子间相关性问题。	DRL representation learning large language model
2	ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting	ScalingGaussian：结合3D和2D扩散模型，提升高质量3D内容生成。	distillation gaussian splatting splatting
3	Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation	提出基于统一知识蒸馏的通用病理学基础模型，提升临床任务泛化性	representation learning distillation foundation model
4	VSSD: Vision Mamba with Non-Causal State Space Duality	提出VSSD：一种非因果状态空间对偶视觉Mamba模型，提升视觉任务性能。	Mamba SSM state space model	✅
5	Modality-Balanced Learning for Multimedia Recommendation	提出反事实知识蒸馏以解决多模态推荐中的模态不平衡问题	distillation multimodal	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
6	MangaUB: A Manga Understanding Benchmark for Large Multimodal Models	MangaUB：一个用于评估大型多模态模型漫画理解能力的新基准	multimodal
7	Algorithm Research of ELMo Word Embedding and Deep Learning Multimodal Transformer in Image Description	提出ELMo-MCT的零样本医学图像描述算法，提升已知类别泛化性	multimodal
8	HICEScore: A Hierarchical Metric for Image Captioning Evaluation	提出HICEScore，一种用于图像描述评估的分层无参考指标，解决现有方法对局部幻觉和细粒度视觉信息敏感度不足的问题。	large language model multimodal	✅
9	Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment	提出MARNet，利用扩散模型统一视觉和语义特征空间，增强跨模态对齐，提升图像分类鲁棒性。	multimodal
10	Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation	提出频谱分解Token学习框架，提升领域泛化语义分割性能	foundation model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
11	IOVS4NeRF:Incremental Optimal View Selection for Large-Scale NeRFs	IOVS4NeRF：面向大规模NeRF的增量式最优视角选择方法	NeRF neural radiance field scene reconstruction
12	HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors	提出HybridDepth，融合景深与单图先验，实现鲁棒的度量深度估计。	depth estimation metric depth	✅
13	Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations	提出EvINR，利用隐式神经表示自监督学习事件到视频的重建，无需光流估计。	optical flow spatiotemporal	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers	提出关系先验蒸馏(RPD)方法，提升点云跨域分类性能	sim-to-real distillation	✅
15	Floating No More: Object-Ground Reconstruction from a Single Image	提出ORG模型，从单张图像重建对象与地面的3D几何关系，解决物体悬浮问题	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页