cs.CV(2024-11-05)

📊 共 20 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (7 🔗3) 支柱九:具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱三:空间感知与语义 (Perception & Semantics) (5) 支柱六:视频提取与匹配 (Video Extraction) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (7 篇)

#题目一句话要点标签🔗
1 Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation 提出IISAN-Versa框架,高效适配多模态基础模型于序列推荐,实现SOTA性能。 representation learning large language model foundation model
2 Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting 利用3D高斯溅射进行交互示教中物体与接触点跟踪 imitation learning 3D gaussian splatting gaussian splatting
3 V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization 提出V-DPO,通过视觉引导的直接偏好优化缓解大型视觉语言模型中的幻觉问题 preference learning DPO direct preference optimization
4 Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data 提出双稀疏自中心视频的自-身体姿态估计方法,提升运动捕捉精度。 masked autoencoder egocentric spatiotemporal
5 LiVOS: Light Video Object Segmentation with Gated Linear Matching LiVOS:利用门控线性匹配实现轻量级视频目标分割 linear attention spatiotemporal foundation model
6 Pre-trained Visual Dynamics Representations for Efficient Policy Learning 提出PVDR,利用预训练视觉动力学表征提升强化学习策略学习效率 reinforcement learning policy learning
7 ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal ShadowMamba:基于边界区域选择性扫描的状态空间模型,用于阴影去除 Mamba

🔬 支柱九:具身大模型 (Embodied Foundation Models) (6 篇)

#题目一句话要点标签🔗
8 Personalized Video Summarization by Multimodal Video Understanding 提出基于多模态视频理解的个性化视频摘要方法,提升用户体验。 multimodal
9 MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning MME-Finance:面向金融领域专家级理解与推理的多模态金融基准 multimodal
10 Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding 提出AsphaltNet,通过细粒度空间和语言损失提升3D视觉定位性能 visual grounding
11 FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models 提出FlexCAD以解决可控CAD生成效率低下的问题 large language model
12 Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters 视觉语言模型推理优化:减少视觉tokens,增大模型参数更有效 large language model
13 CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection CRT-Fusion:融合相机、雷达和时序信息的3D目标检测方法 TAMP

🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)

#题目一句话要点标签🔗
14 Multi-modal NeRF Self-Supervision for LiDAR Semantic Segmentation 提出多模态NeRF自监督框架,提升LiDAR语义分割在自动驾驶场景下的性能。 NeRF foundation model
15 CAD-NeRF: Learning NeRFs from Uncalibrated Few-view Images by CAD Model Retrieval CAD-NeRF:利用CAD模型检索,从无标定少视图图像中学习NeRF NeRF neural radiance field
16 Exploring Seasonal Variability in the Context of Neural Radiance Fields for 3D Reconstruction on Satellite Imagery 提出Planet-NeRF,通过月度嵌入向量增强卫星图像NeRF的季节性预测能力 NeRF neural radiance field
17 Correlation of Object Detection Performance with Visual Saliency and Depth Estimation 研究对象检测性能与视觉显著性和深度估计的相关性,为优化模型架构提供指导。 depth estimation Depth Anything
18 HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features HFGaussian:提出融合人体特征的可泛化高斯人体建模方法 3D gaussian splatting gaussian splatting splatting

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
19 Self Supervised Networks for Learning Latent Space Representations of Human Body Scans and Motions 提出自监督网络VariShaPE和MoGeN,用于学习人体扫描和运动的潜在空间表示。 SMPL

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
20 Lost in Context: The Influence of Context on Feature Attribution Methods for Object Recognition 研究上下文对目标识别模型特征归因方法的影响 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页