cs.CV（2024-09-04）

📊 共 21 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (9 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (5 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一：机器人控制 (Robot Control) (2 🔗1) 支柱四：生成式动作 (Generative Motion) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving	提出GGS，一种可泛化的高斯溅射方法，用于自动驾驶中的车道变换。	depth estimation 3D gaussian splatting gaussian splatting
2	Object Gaussian for Monocular 6D Pose Estimation from Sparse Views	SGPose：基于高斯模型的单目稀疏视图6D位姿估计方法	3D gaussian splatting 3DGS gaussian splatting
3	Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth Estimation	Plane2Depth：利用层级自适应平面引导的单目深度估计	depth estimation monocular depth metric depth
4	Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models	Human-VDM：利用视频扩散模型从单张图像生成高质量3D人体高斯溅射模型	gaussian splatting splatting	✅
5	iConFormer: Dynamic Parameter-Efficient Tuning with Input-Conditioned Adaptation	iConFormer：输入条件自适应的动态参数高效微调方法	depth estimation monocular depth
6	UniTT-Stereo: Unified Training of Transformer for Enhanced Stereo Matching	UniTT-Stereo：统一训练Transformer以增强立体匹配性能	depth estimation stereo depth
7	TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT	提出TP-GMOT框架，通过文本提示和运动-外观代价解决通用多目标跟踪问题。	open-vocabulary open vocabulary	✅
8	Boosting Generalizability towards Zero-Shot Cross-Dataset Single-Image Indoor Depth by Meta-Initialization	提出基于元学习的单目图像深度估计方法，提升零样本跨数据集泛化能力	depth estimation
9	Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes	提出基于多层网格的体表面表示方法，实现毛发等模糊几何体的实时高质量渲染。	splatting

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
10	A Medical Multimodal Large Language Model for Pediatric Pneumonia	提出P2Med-MLLM，用于儿童肺炎的医学多模态大语言模型，辅助诊断与治疗。	large language model multimodal
11	ExpLLM: Towards Chain of Thought for Facial Expression Recognition	提出ExpLLM，利用大语言模型进行面部表情识别的链式推理	large language model chain-of-thought
12	CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently	CanvOI：一种肿瘤智能基础模型，通过差异化FLOPS扩展提升病理图像分析性能	foundation model
13	Local Map Construction with SDMap: A Comprehensive Survey	综述SDMap辅助的局部地图构建方法，为智能驾驶提供低成本、高可用的环境感知方案。	multimodal
14	Unified Framework with Consistency across Modalities for Human Activity Recognition	提出基于跨模态一致性的统一框架，用于提升视频人体行为识别性能。	multimodal	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
15	UC-NeRF: Uncertainty-aware Conditional Neural Radiance Fields from Endoscopic Sparse Views	UC-NeRF：针对内窥镜稀疏视角，提出不确定性感知条件神经辐射场方法	distillation NeRF neural radiance field	✅
16	SG-MIM: Structured Knowledge Guided Efficient Pre-training for Dense Prediction	提出SG-MIM，利用结构化知识指导的掩码图像建模，提升密集预测任务性能。	representation learning depth estimation monocular depth
17	Non-target Divergence Hypothesis: Toward Understanding Domain Gaps in Cross-Modal Knowledge Distillation	提出非目标发散假设，分析跨模态知识蒸馏中的域差异问题	distillation
18	Collaborative Learning for Enhanced Unsupervised Domain Adaptation	提出CLDA协同学习框架，提升轻量级模型在无监督领域自适应任务中的性能。	teacher-student distillation

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	MADiff: Motion-Aware Mamba Diffusion Models for Hand Trajectory Prediction on Egocentric Videos	MADiff：基于运动感知Mamba扩散模型，预测第一视角视频中的手部轨迹	manipulation Mamba affordance	✅
20	Incorporating dense metric depth into neural 3D representations for view synthesis and relighting	提出结合稠密深度信息的神经3D表示方法，用于视角合成和光照重定向	manipulation metric depth

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	PoseTalk: Text-and-Audio-based Pose Control and Motion Refinement for One-Shot Talking Head Generation	PoseTalk：提出基于文本和音频的姿态控制和运动细化的一镜式说话人头部生成方法	motion latent	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页