cs.CV（2024-08-14）

📊 共 16 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (7) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (2) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	MathScape: Benchmarking Multimodal Large Language Models in Real-World Mathematical Contexts	MathScape：提出真实世界数学场景多模态大语言模型评测基准	large language model multimodal
2	Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach	提出一种模态不变的多模态学习方法，提升缺失模态下的鲁棒性。	multimodal
3	Robust Semi-supervised Multimodal Medical Image Segmentation via Cross Modality Collaboration	提出一种鲁棒的半监督多模态医学图像分割框架，解决数据稀缺和模态错位问题。	multimodal
4	Enhancing Visual Question Answering through Ranking-Based Hybrid Training and Multimodal Fusion	提出Rank VQA模型，通过排序混合训练和多模态融合提升视觉问答性能。	multimodal
5	LLMI3D: MLLM-based 3D Perception from a Single 2D Image	提出LLMI3D以解决单张2D图像的3D感知问题	large language model multimodal
6	Segment Using Just One Example	提出基于单样本图像的语义分割方法，利用SAM自动生成提示。	foundation model
7	Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach	提出一种数据驱动的文本-视频检索框架，通过增强文本表示解决信息不对称问题。	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Progressive Radiance Distillation for Inverse Rendering with Gaussian Splatting	提出渐进式辐射蒸馏，结合物理渲染与高斯溅射实现高质量逆渲染	distillation gaussian splatting splatting
9	End-to-end Semantic-centric Video-based Multimodal Affective Computing	提出SemanticMAC框架，解决视频多模态情感计算中的语义失衡与失配问题。	representation learning contrastive learning multimodal
10	Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification	提出基于SAM的领域不变表征学习框架，提升血细胞跨域分类精度	representation learning foundation model	✅
11	Knowledge Distillation with Refined Logits	提出精炼Logit蒸馏（RLD）方法，提升知识蒸馏效果并保持类间相关性。	distillation	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space	提出基于3D空间辐射场的开放词汇分割方法，实现完整3D语义理解	3DGS NeRF open-vocabulary
13	Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling	提出基于几何建模的内窥镜单目深度估计增强方法，解决尺度感知问题	depth estimation monocular depth

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
14	UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection	提出UAHOI，通过不确定性感知学习提升HOI检测的准确性和鲁棒性	human-object interaction HOI

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	3D Gaussian Editing with A Single Image	提出基于单张图像的3D高斯编辑方法，实现对3D场景的直观操控。	manipulation 3D gaussian splatting gaussian splatting

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing	提出G$^2$V$^2$former，结合人脸和 Landmark，解决视频人脸反欺骗中动态线索缺失问题。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页