cs.CV（2024-08-13）

📊 共 21 篇论文 | 🔗 2 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (6 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (3) 支柱五：交互与反应 (Interaction & Reaction) (2) 支柱四：生成式动作 (Generative Motion) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis	提出SpectralGaussians，用于多光谱场景的语义化、光谱3D高斯splatting表示、可视化与分析。	3D gaussian splatting 3DGS gaussian splatting
2	HDRGS: High Dynamic Range Gaussian Splatting	提出HDR-GS方法，利用高动态范围高斯溅射技术重建高质量HDR场景。	gaussian splatting splatting NeRF
3	NeRF-US: Removing Ultrasound Imaging Artifacts from Neural Radiance Fields in the Wild	NeRF-US：提出一种去除野生超声成像神经辐射场伪影的方法	NeRF neural radiance field
4	SceneGPT: A Language Model for 3D Scene Understanding	SceneGPT：一种用于3D场景理解的语言模型，无需3D预训练。	scene understanding affordance spatial relationship
5	SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields	提出SlotLifter，通过槽引导特征提升学习面向对象的辐射场，实现场景重建与分解。	scene reconstruction
6	ActiveNeRF: Learning Accurate 3D Geometry by Active Pattern Projection	ActiveNeRF：通过主动图案投影学习精确3D几何	NeRF	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
7	CROME: Cross-Modal Adapters for Efficient Multimodal LLM	CROME：用于高效多模态LLM的跨模态适配器	large language model multimodal instruction following
8	PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology	PathInsight：指令微调多模态模型，助力病理学智能辅助诊断	multimodal
9	Multimodal Analysis of White Blood Cell Differentiation in Acute Myeloid Leukemia Patients using a β-Variational Autoencoder	提出基于β-VAE的多模态分析方法，用于理解急性髓系白血病患者的白细胞分化。	multimodal
10	Sumotosima: A Framework and Dataset for Classifying and Summarizing Otoscopic Images	Sumotosima：用于耳镜图像分类与摘要的深度学习框架与数据集	multimodal	✅
11	DC3DO: Diffusion Classifier for 3D Objects	DC3DO：利用扩散模型进行零样本3D物体分类，无需额外训练。	multimodal
12	Specialized Change Detection using Segment Anything	提出基于SAM的专精变化检测方法，解决特定目标消失检测问题。	foundation model

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Integrating Saliency Ranking and Reinforcement Learning for Enhanced Object Detection	结合显著性排序与强化学习，提升轻量级目标检测性能	reinforcement learning deep reinforcement learning
14	Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator	提出Inter-class Feature Compensator (INFER)，高效解决数据集蒸馏中的类间特征隔离问题。	distillation
15	Oracle Bone Script Similiar Character Screening Approach Based on Simsiam Contrastive Learning and Supervised Learning	提出基于SimSiam对比学习和监督学习的甲骨文相似字筛选方法	contrastive learning

🔬 支柱五：交互与反应 (Interaction & Reaction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
16	Efficient Human-Object-Interaction (EHOI) Detection via Interaction Label Coding and Conditional Decision	提出一种高效的人-物交互检测器EHOI，兼顾性能、效率和可解释性。	human-object interaction HOI
17	MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers	MV-DETR：基于多视角DETR Transformer的多模态室内物体检测	ReMoS

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	ViMo: Generating Motions from Casual Videos	提出ViMo以解决视频生成3D人类动作的挑战	motion generation video-to-motion

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Controlling the World by Sleight of Hand	CosHand：提出动作条件生成模型，用于预测手部与物体交互后的图像变化	manipulation world model

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Visual Neural Decoding via Improved Visual-EEG Semantic Consistency	提出Visual-EEG语义解耦框架，提升脑电信号视觉神经解码的语义一致性	geometric consistency

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	Dynamic and Compressive Adaptation of Transformers From Images to Videos	提出InTI，通过动态帧间Token插值实现Transformer从图像到视频的压缩自适应。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页