cs.CV(2024-09-11)

📊 共 23 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (8 🔗1) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱一:机器人控制 (Robot Control) (4) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱二:RL算法与架构 (RL & Architecture) (2 🔗1) 支柱四:生成式动作 (Generative Motion) (1)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (8 篇)

#题目一句话要点标签🔗
1 Foundation Models Boost Low-Level Perceptual Similarity Metrics 利用中间层特征,无需训练即可提升全参考图像质量评估的感知相似性度量 foundation model
2 MRAC Track 1: 2nd Workshop on Multimodal, Generative and Responsible Affective Computing MRAC 2024 Track 1:关注多模态生成情感计算与负责任AI的研讨会 multimodal
3 Multimodal Emotion Recognition with Vision-language Prompting and Modality Dropout 提出EmoVCLIP模型,结合视觉-语言提示学习和模态Dropout,提升多模态情感识别精度。 multimodal
4 Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering 提出RACC,通过压缩上下文提升知识型视觉问答的效率与性能。 large language model multimodal
5 Self-Masking Networks for Unsupervised Adaptation 提出自监督掩码网络,高效微调预训练模型以适应下游任务。 foundation model
6 PiTe: Pixel-Temporal Alignment for Large Video-Language Model PiTe:通过像素-时间对齐实现大型视频语言模型 large language model
7 MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis MVLLaVA:用于统一和灵活的新视角合成的智能Agent multimodal
8 FSMDet: Vision-guided feature diffusion for fully sparse 3D detector FSMDet:视觉引导特征扩散的全稀疏3D检测器,提升效率与精度。 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
9 ThermalGaussian: Thermal 3D Gaussian Splatting 提出ThermalGaussian,实现RGB和热成像模态下的高质量3D高斯重建与实时渲染 3D gaussian splatting 3DGS gaussian splatting
10 Self-Evolving Depth-Supervised 3D Gaussian Splatting from Rendered Stereo Pairs 提出自进化深度监督3D高斯溅射,利用渲染立体图像对提升深度精度。 3D gaussian splatting gaussian splatting splatting
11 Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Hi3D:利用视频扩散模型实现高分辨率图像到3D生成 3D gaussian splatting gaussian splatting splatting
12 Violence detection in videos using deep recurrent and convolutional neural networks 提出结合RNN和CNN的深度学习架构,用于视频中的暴力行为检测。 optical flow
13 DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation DreamMesh:联合操纵和纹理化三角网格,实现高质量文本到3D生成 NeRF
14 Rethinking Directional Parameterization in Neural Implicit Surface Reconstruction 提出混合方向参数化方法,提升神经隐式表面重建对复杂材质和几何体的重建效果 implicit representation

🔬 支柱一:机器人控制 (Robot Control) (4 篇)

#题目一句话要点标签🔗
15 StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos StereoCrafter:提出一种基于扩散模型的单目视频生成高质量立体3D视频方法 Apple Vision Pro splatting foundation model
16 Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy 提出基于条件StyleGAN和潜在空间操控的视网膜图像生成方法,提升糖尿病视网膜病变诊断。 manipulation
17 Feature Importance in Pedestrian Intention Prediction: A Context-Aware Review 提出上下文感知排列特征重要性(CAPFI),提升行人意图预测模型的可解释性。 locomotion predictive model
18 Single-View 3D Reconstruction via SO(2)-Equivariant Gaussian Sculpting Networks 提出SO(2)等变高斯雕刻网络,用于单视图3D重建 manipulation

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
19 Benchmarking 2D Egocentric Hand Pose Datasets 针对2D自中心手势估计,提出一种新的数据集评估协议,并对现有数据集进行基准测试。 egocentric
20 FaVoR: Features via Voxel Rendering for Camera Relocalization FaVoR:利用体素渲染特征实现相机重定位,提升视角变化下的鲁棒性。 feature matching

🔬 支柱二:RL算法与架构 (RL & Architecture) (2 篇)

#题目一句话要点标签🔗
21 Current Symmetry Group Equivariant Convolution Frameworks for Representation Learning 综述对称群等变卷积框架,用于解决非欧空间表示学习问题。 representation learning
22 Retinex-RAWMamba: Bridging Demosaicing and Denoising for Low-Light RAW Image Enhancement 提出Retinex-RAWMamba,桥接去马赛克和去噪,用于低光RAW图像增强。 Mamba

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
23 DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures DiffTED:基于扩散模型的单样本音频驱动TED演讲视频生成,实现自然口型和丰富肢体动作 classifier-free guidance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页