cs.CV（2024-09-12）

📊 共 28 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (10 🔗4) 支柱二：RL算法与架构 (RL & Architecture) (8 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Advancing Depth Anything Model for Unsupervised Monocular Depth Estimation in Endoscopy	针对内窥镜图像，提出改进的Depth Anything模型用于无监督单目深度估计	depth estimation monocular depth Depth Anything
2	FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally	提出FlashSplat，通过线性规划最优求解2D到3D高斯溅射分割问题	3D gaussian splatting gaussian splatting splatting	✅
3	SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length	SwinGS：提出基于滑动窗口高斯溅射的任意长度体视频实时流式传输框架	3D gaussian splatting 3DGS gaussian splatting
4	Open-Vocabulary Remote Sensing Image Semantic Segmentation	提出面向遥感图像的开放词汇语义分割框架，解决方向和尺度变化难题。	semantic map open-vocabulary open vocabulary	✅
5	LED: Light Enhanced Depth Estimation at Night	LED：利用车头灯光增强夜间深度估计，提升自动驾驶安全性	depth estimation Depth Anything scene understanding
6	Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis	Thermal3D-GS：利用物理先验的三维高斯模型，用于热红外新视角合成	3D gaussian splatting gaussian splatting splatting	✅
7	Expansive Supervision for Neural Radiance Field	提出Expansive Supervision，通过部分光线选择监督加速NeRF训练，降低时间和内存消耗。	NeRF neural radiance field
8	FIReStereo: Forest InfraRed Stereo Dataset for UAS Depth Perception in Visually Degraded Environments	FIReStereo：用于视觉退化环境中无人机深度感知的森林红外立体数据集	depth estimation stereo depth
9	Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor	提出Depth on Demand，利用低帧率深度传感器和高帧率RGB相机实现高精度稠密深度流。	depth estimation
10	Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction	提出深度高度解耦(DHD)框架，提升视觉3D Occupancy预测精度	height map	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
11	DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors	DreamHOI：利用扩散先验实现主体驱动的3D人-物交互生成	distillation NeRF neural radiance field
12	Real-time Multi-view Omnidirectional Depth Estimation for Real Scenarios based on Teacher-Student Learning with Unlabeled Data	提出Rt-OmniMVS，一种基于教师-学生学习的实时多视角全景深度估计方法，适用于真实场景。	teacher-student depth estimation
13	Do Vision Foundation Models Enhance Domain Generalization in Medical Image Segmentation?	利用视觉基础模型和HQHSAM解码头提升医学图像分割的领域泛化能力	MAE foundation model	✅
14	MambaMIC: An Efficient Baseline for Microscopic Image Classification with State Space Models	MambaMIC：一种基于状态空间模型的高效显微图像分类基线方法	Mamba SSM state space model	✅
15	CollaMamba: Efficient Collaborative Perception with Cross-Agent Spatial-Temporal State Space Model	CollaMamba：提出基于空间-时间状态空间模型的协同感知方法，提升效率。	Mamba SSM state space model
16	Top-down Activity Representation Learning for Video Question Answering	提出基于自顶向下活动表示学习的视频问答方法，提升长时序上下文事件理解能力。	representation learning multimodal
17	Learning Brain Tumor Representation in 3D High-Resolution MR Images via Interpretable State Space Models	提出基于状态空间模型的掩码自编码器，用于学习3D高分辨率脑肿瘤MR图像表征。	SSM state space model masked autoencoder
18	Multi-object event graph representation learning for Video Question Answering	提出CLanG，利用对比学习多对象事件图表示，提升视频问答中复杂场景理解能力。	representation learning contrastive learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Large Language Model-Guided Semantic Alignment for Human Activity Recognition	LanHAR：利用大语言模型进行语义对齐的人体活动识别	large language model	✅
20	SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality	SimMAT：探索视觉基础模型向任意图像模态的可迁移性	foundation model
21	Deep Multimodal Learning with Missing Modality: A Survey	综述缺失模态下的深度多模态学习方法，应对实际应用中模态数据缺失问题。	multimodal
22	HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers	HiRT：利用分层机器人Transformer增强机器人控制，实现动态任务中的实时交互。	vision-language-action VLA
23	What Makes a Maze Look Like a Maze?	提出Deep Schema Grounding (DSG)框架，提升视觉抽象概念的理解与推理能力	large language model
24	Bayesian Self-Training for Semi-Supervised 3D Segmentation	提出基于贝叶斯自训练的半监督3D分割框架，提升标注数据稀缺场景下的分割精度。	visual grounding
25	Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings	提出一种基于情感的绘画音乐生成模型，弥合视觉艺术与音乐之间的鸿沟	multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
26	ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE	提出ProbTalk3D以解决情感控制的语音驱动3D面部动画合成问题	VQ-VAE	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	Depth Matters: Exploring Deep Interactions of RGB-D for Semantic Segmentation in Traffic Scenes	提出深度交互金字塔Transformer，解决交通场景语义分割中深度信息利用不足问题	spatial relationship

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
28	GAZEploit: Remote Keystroke Inference Attack by Gaze Estimation from Avatar Views in VR/MR Devices	GAZEploit：利用VR/MR设备中头像视图的视线估计进行远程击键推断攻击	Apple Vision Pro

⬅️ 返回 cs.CV 首页 · 🏠 返回主页