cs.CV（2024-08-19）

📊 共 27 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (11 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗1) 支柱一：机器人控制 (Robot Control) (1) 支柱七：动作重定向 (Motion Retargeting) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (11 篇)

#	题目	一句话要点	标签	🔗	⭐
1	3D-Consistent Human Avatars with Sparse Inputs via Gaussian Splatting and Contrastive Learning	CHASE：利用高斯溅射和对比学习，通过稀疏输入实现3D一致的人体化身	contrastive learning 3D gaussian splatting 3DGS
2	ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement	ExpoMamba：利用频率SSM块实现高效图像增强，解决低光照和混合曝光问题	Mamba SSM foundation model
3	R2GenCSR: Retrieving Context Samples for Large Language Model based X-ray Medical Report Generation	R2GenCSR：提出一种基于上下文检索的X射线医学报告生成框架，提升LLM生成质量。	Mamba large language model	✅
4	MambaLoc: Efficient Camera Localisation via State Space Model	MambaLoc：提出基于状态空间模型的高效相机定位方法，解决训练成本高和数据稀疏问题。	Mamba SSM state space model
5	OccMamba: Semantic Occupancy Prediction with State Space Models	提出OccMamba，首个基于Mamba架构的语义占据预测网络，提升效率与精度。	Mamba state space model	✅
6	$R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement	提出基于强化学习的网格重建方法，通过几何与外观优化提升NeRF重建质量	reinforcement learning NeRF neural radiance field
7	Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data	Factorized-Dreamer：利用有限低质量数据训练高质量视频生成器	dreamer optical flow spatiotemporal	✅
8	P3P: Pseudo-3D Pre-training for Scaling 3D Voxel-based Masked Autoencoders	提出P3P框架，利用伪3D预训练扩展体素化掩码自编码器，提升3D感知任务性能。	masked autoencoder MAE depth estimation	✅
9	Multi-Scale Representation Learning for Image Restoration with State-Space Model	提出基于状态空间模型的多尺度图像复原网络MS-Mamba，实现高效高质量图像重建。	Mamba SSM representation learning
10	CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs	CLIP-DPO：利用视觉-语言模型偏好优化减少LVLM幻觉	DPO
11	C${^2}$RL: Content and Context Representation Learning for Gloss-free Sign Language Translation and Retrieval	提出C${^2}$RL，用于无词汇的手语翻译和检索，提升表征学习能力。	representation learning

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
12	FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant	提出FFAA：基于多模态大语言模型的可解释开放世界人脸伪造分析助手	large language model multimodal
13	Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation	Kubrick：基于多模态Agent协作的合成视频生成框架	large language model multimodal instruction following
14	CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving	提出CoVLA数据集，用于训练自动驾驶中具备视觉-语言-动作能力的模型	vision-language-action VLA large language model
15	Pedestrian Attribute Recognition: A New Benchmark Dataset and A Large Language Model Augmented Framework	提出MSP60K数据集与LLM-PAR框架以解决行人属性识别问题	large language model	✅
16	Narrowing the Gap between Vision and Action in Navigation	提出低级动作解码器与语义增强航点预测器，提升连续环境视觉语言导航性能	VLN
17	LongVILA: Scaling Long-Context Visual Language Models for Long Videos	LongVILA：通过算法-系统协同设计，扩展视觉语言模型处理长视频上下文的能力	foundation model
18	Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track	利用SAM 2实现视频目标分割，LSVOS挑战赛VOS赛道第四名	foundation model
19	Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit	VisEdit：通过编辑视觉表征实现视觉语言模型知识校正	large language model

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Implicit Gaussian Splatting with Efficient Multi-Level Tri-Plane Representation	提出基于多层三平面表示的隐式高斯溅射，实现高效存储和高质量渲染。	3DGS gaussian splatting splatting
21	Topology-aware Human Avatars with Semantically-guided Gaussian Splatting	提出SG-GS，利用语义引导的高斯溅射重建拓扑感知的人体Avatar	gaussian splatting splatting SMPL
22	SHARP: Segmentation of Hands and Arms by Range using Pseudo-Depth for Enhanced Egocentric 3D Hand Pose Estimation and Action Recognition	SHARP：利用伪深度分割手部和手臂，提升以自我为中心的3D手势估计和动作识别。	depth estimation egocentric
23	3D-Aware Instance Segmentation and Tracking in Egocentric Videos	提出一种3D感知的自中心视频实例分割与跟踪方法，提升场景理解能力。	scene understanding egocentric
24	NeuFlow v2: Push High-Efficiency Optical Flow To the Limit	NeuFlow v2：突破光流估计效率极限，兼顾精度与速度	optical flow	✅
25	LoopSplat: Loop Closure by Registering 3D Gaussian Splats	LoopSplat：通过3D高斯溅射配准实现闭环检测，提升RGB-D SLAM全局一致性。	3DGS

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
26	Structure-preserving Image Translation for Depth Estimation in Colonoscopy Video	提出结构保持的图像转换方法，提升结肠镜视频深度估计精度	sim2real depth estimation monocular depth

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views	SpaRP：基于稀疏视角的快速3D物体重建与姿态估计方法	spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页