cs.CV（2024-10-25）

📊 共 19 篇论文 | 🔗 7 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (6 🔗4) 支柱九：具身大模型 (Embodied Foundation Models) (6) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗2) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1) 支柱四：生成式动作 (Generative Motion) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
1	DiffGS: Functional Gaussian Splatting Diffusion	提出DiffGS，一种基于潜在扩散模型的功能高斯溅射生成方法，实现高质量快速渲染。	3D gaussian splatting 3DGS gaussian splatting
2	ArCSEM: Artistic Colorization of SEM Images via Gaussian Splatting	提出基于高斯溅射的ArCSEM方法，实现扫描电镜图像的艺术化自动着色	gaussian splatting splatting	✅
3	Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization	提出内容感知辐射场，通过对抗内容感知量化实现模型复杂度与场景复杂度的对齐。	3D gaussian splatting gaussian splatting splatting	✅
4	Evaluation of strategies for efficient rate-distortion NeRF streaming	研究NeRF流式传输的率失真性能，提出神经网络参数流式传输策略。	NeRF neural radiance field scene reconstruction
5	MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors	MonoDGP：利用解耦查询和几何误差先验的单目3D目标检测	depth estimation metric depth	✅
6	FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation	FastPCI：运动结构引导的快速点云帧插值方法	scene flow	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
7	OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery	OReole-FM：面向高分辨率卫星图像的十亿参数级遥感基础模型探索	foundation model
8	A Multimodal Approach For Endoscopic VCE Image Classification Using BiomedCLIP-PubMedBERT	提出基于BiomedCLIP-PubMedBERT的多模态方法，用于内窥镜VCE图像分类。	multimodal
9	TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning	TimeSuite：通过Grounded Tuning提升MLLM在长视频理解中的能力	large language model multimodal TAMP
10	Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models	Frozen-DETR：利用冻结的预训练模型增强DETR目标检测性能	foundation model
11	Turn-by-Turn Indoor Navigation for the Visually Impaired	提出一种基于智能手机和树莓派的盲人室内Turn-by-Turn导航系统	large language model multimodal
12	MaCTG: Multi-Agent Collaborative Thought Graph for Automatic Programming	提出MaCTG，通过多智能体协作图解决自动编程中任务规划低效和幻觉问题	large language model

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Exploring Self-Supervised Learning with U-Net Masked Autoencoders and EfficientNet B7 for Improved Classification	提出基于U-Net掩码自编码器和EfficientNet B7的自监督学习方法，提升图像分类精度。	masked autoencoder
14	Topology-aware Mamba for Crack Segmentation in Structures	提出CrackMamba以解决基础设施裂缝分割问题	Mamba	✅
15	Diverse Sign Language Translation	提出DivSLT任务，解决手语翻译中一对多映射问题，提升翻译多样性和准确性	reinforcement learning large language model
16	Fusion-then-Distillation: Toward Cross-modal Positive Distillation for Domain Adaptive 3D Semantic Segmentation	提出Fusion-then-Distillation方法，用于领域自适应3D语义分割中的跨模态正向蒸馏。	distillation	✅

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	x-RAGE: eXtended Reality -- Action & Gesture Events Dataset	x-RAGE：用于扩展现实中动作与手势事件的首个事件相机数据集	egocentric first-person view	✅

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality	FasterCache：一种高质量、免训练的视频扩散模型加速策略	classifier-free guidance

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Unsupervised Machine Learning for Detecting and Locating Human-Made Objects in 3D Point Cloud	提出基于非监督学习的三维点云人工地物检测与定位方法	PULSE

⬅️ 返回 cs.CV 首页 · 🏠 返回主页