cs.CV（2024-08-30）

📊 共 18 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (9 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (2) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	DARES: Depth Anything in Robotic Endoscopic Surgery with Self-supervised Vector-LoRA of the Foundation Model	DARES：利用自监督Vector-LoRA改进机器人内窥镜手术中的Depth Anything模型	depth estimation monocular depth Depth Anything	✅
2	Open-Vocabulary Action Localization with Iterative Visual Prompting	提出基于迭代视觉提示的开放词汇动作定位方法，无需训练即可实现视频动作定位。	open-vocabulary open vocabulary	✅
3	AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding	AdaptVision：MLLM中动态输入缩放，用于多功能场景理解	scene understanding large language model multimodal	✅
4	UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios	UrBench：一个综合性的多视角城市场景大模型评测基准	scene understanding multimodal
5	Synthetic Lunar Terrain: A Multimodal Open Dataset for Training and Evaluating Neuromorphic Vision Algorithms	提出合成月球地形（SLT）多模态开放数据集，用于训练和评估神经形态视觉算法。	depth estimation multimodal
6	OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping	OG-Mapping：基于八叉树结构化3D高斯的在线稠密建图方法	3D gaussian splatting 3DGS gaussian splatting
7	2DGH: 2D Gaussian-Hermite Splatting for High-quality Rendering and Better Geometry Reconstruction	提出基于高斯-埃尔米特核的2D高斯溅射，提升渲染质量和几何重建效果	gaussian splatting splatting
8	BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities	BOP-Distrib：重新审视6D位姿估计基准，提升视觉歧义下的评估质量	6D pose estimation
9	ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images	提出ConDense框架以解决3D基础模型训练中的特征一致性问题	NeRF foundation model

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
10	VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters	VisionTS：利用视觉掩码自编码器实现零样本时间序列预测	masked autoencoder large language model foundation model	✅
11	Stochastic Layer-Wise Shuffle for Improving Vision Mamba Training	提出随机分层Shuffle方法，提升Vision Mamba在ImageNet上的训练效果	Mamba
12	Instant Adversarial Purification with Adversarial Consistency Distillation	提出OSCP，通过对抗一致性蒸馏实现单步扩散模型对抗样本净化，显著提升效率。	distillation
13	Contrastive Learning with Synthetic Positives	提出CLSP方法，利用合成图像作为对比学习的补充正样本，提升自监督学习性能。	contrastive learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
14	NanoMVG: USV-Centric Low-Power Multi-Task Visual Grounding based on Prompt-Guided Camera and 4D mmWave Radar	NanoMVG：面向USV的低功耗多任务视觉定位模型，融合提示引导的相机和4D毫米波雷达	visual grounding
15	From Latent to Engine Manifolds: Analyzing ImageBind's Multimodal Embedding Space	利用ImageBind分析多模态嵌入空间，为在线汽配列表生成有意义的融合嵌入。	multimodal

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	EMHI: A Multimodal Egocentric Human Motion Dataset with HMD and Body-Worn IMUs	提出EMHI多模态数据集，用于解决VR/AR中基于头显和IMU的以自我为中心的人体运动估计问题	SMPL egocentric multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation	TIMotion：提出时序交互框架，高效生成人与人之间的互动动作	motion generation	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning	提出视频级混合数据和时空适配器，提升深度伪造视频检测的泛化能力。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页