cs.CV（2024-08-20）

📊 共 30 篇论文 | 🔗 11 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (11 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗2) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting	GS-CPR：利用3D高斯溅射实现高效相机姿态优化	3D gaussian splatting 3DGS gaussian splatting	✅
2	OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding	提出OpenScan基准，用于广义开放词汇3D场景理解	scene understanding open-vocabulary open vocabulary
3	Near, far: Patch-ordering enhances vision foundation models' scene understanding	提出NeCo损失函数，通过patch排序增强视觉基础模型场景理解能力	scene understanding foundation model
4	SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition	利用Conv-Attention增强Emotion-LLaMA，提升多模态情感识别性能	open-vocabulary open vocabulary multimodal	✅
5	On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes	评估开放词汇模型在异常街景目标检测中的潜力，揭示其在开放世界场景下的局限性。	open-vocabulary open vocabulary
6	Lightweight Modular Parameter-Efficient Tuning for Open-Vocabulary Object Detection	提出UniProj-Det，一种轻量级模块化参数高效的开放词汇目标检测框架	open-vocabulary open vocabulary
7	TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks	TrackNeRF：通过特征轨迹进行NeRF的Bundle Adjustment，解决稀疏和噪声视角下的重建问题	NeRF neural radiance field
8	DEGAS: Detailed Expressions on Full-Body Gaussian Avatars	提出DEGAS以解决全身高斯头像中细致表情建模问题	3D gaussian splatting 3DGS gaussian splatting
9	Open 3D World in Autonomous Driving	提出一种融合3D点云与文本信息的开放词汇自动驾驶感知方法	open-vocabulary open vocabulary multimodal
10	Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant	提出PoVo，首个无需词汇表的3D实例分割方法，利用视觉-语言助手实现开放场景理解。	open-vocabulary open vocabulary	✅
11	PooDLe: Pooled and dense self-supervised learning from naturalistic videos	PooDLe：结合池化与密集自监督学习，从自然视频中学习表征	optical flow

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
12	Large Language Models for Multimodal Deformable Image Registration	提出LLM-Morph框架，利用大语言模型解决多模态可变形图像配准难题。	large language model multimodal	✅
13	FLAME: Learning to Navigate with Multimodal LLM in Urban Environments	FLAME：一种基于多模态LLM的城市环境导航学习方法	VLN large language model multimodal
14	ISLES'24 -- A Real-World Longitudinal Multimodal Stroke Dataset	ISLES'24发布真实世界纵向多模态卒中数据集，助力机器学习算法开发。	multimodal
15	ISLES'24: Final Infarct Prediction with Multimodal Imaging and Clinical Data. Where Do We Stand?	ISLES'24挑战赛：基于多模态影像和临床数据预测脑梗死，揭示当前技术瓶颈。	multimodal
16	ViLReF: An Expert Knowledge Enabled Vision-Language Retinal Foundation Model	提出ViLReF，一种专家知识驱动的视网膜视觉-语言预训练模型	foundation model	✅
17	HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models	HiRED：一种用于高效推理高分辨率视觉-语言模型的注意力引导Token丢弃方法	large language model multimodal	✅
18	Tapping in a Remote Vehicle's onboard LLM to Complement the Ego Vehicle's Field-of-View	利用远程车辆车载LLM增强自车视野，提升交通安全	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
19	SenPa-MAE: Sensor Parameter Aware Masked Autoencoder for Multi-Satellite Self-Supervised Pretraining	提出SenPa-MAE，用于多卫星遥感影像自监督预训练，解决跨传感器数据融合问题。	masked autoencoder MAE foundation model
20	ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining	ShapeSplat：大规模高斯溅射数据集及其自监督预训练	representation learning MAE 3D gaussian splatting
21	MambaEVT: Event Stream based Visual Object Tracking using State Space Model	提出基于Mamba状态空间模型的事件流视觉目标跟踪框架MambaEVT	Mamba state space model	✅
22	MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval	提出MUSE：一种基于Mamba的高效多尺度文本视频检索模型	Mamba
23	Adaptive Knowledge Distillation for Classification of Hand Images using Explainable Vision Transformers	提出基于可解释Vision Transformer的自适应知识蒸馏方法，用于手部图像分类。	distillation
24	Event Stream-based Sign Language Translation: A High-Definition Benchmark Dataset and A Novel Baseline	提出Event-CSL事件流手语翻译数据集和EvSLT基线模型，解决光照和隐私问题。	Mamba spatiotemporal	✅

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
25	Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics	PartGS：提出一种自监督混合表示学习框架，用于三维场景的部件级解析与重建。	manipulation NeRF
26	A Gray-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse	提出后验坍塌攻击PCA，保护图像免受基于LDM的未经授权编辑。	manipulation	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	A Review of Human-Object Interaction Detection	综述图像中人-物交互检测方法，分析挑战与未来趋势。	human-object interaction HOI

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
28	Multi-view Hand Reconstruction with a Point-Embedded Transformer	提出POEM模型，利用点嵌入Transformer实现通用多视角手部网格重建	HMR hand reconstruction	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
29	CrossFi: A Cross Domain Wi-Fi Sensing Framework Based on Siamese Network	提出CrossFi，一种基于孪生网络的跨域Wi-Fi感知框架，解决领域迁移问题。	penetration	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
30	A Noncontact Technique for Wave Measurement Based on Thermal Stereography and Deep Learning	提出基于热立体视觉和深度学习的非接触式波浪测量技术	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页