cs.CV(2025-01-13)

📊 共 24 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (10 🔗3) 支柱三:空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (2) 支柱六:视频提取与匹配 (Video Extraction) (2) 支柱一:机器人控制 (Robot Control) (2)

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
1 Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation 提出基于元学习的动态多模态融合框架MetaMMF,用于提升微视频推荐效果。 representation learning multimodal
2 MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation 提出MSV-Mamba,用于提升超声心动图分割精度,尤其针对复杂结构。 Mamba spatiotemporal
3 Dataset Distillation via Committee Voting 提出基于委员会投票的数据集蒸馏方法CV-DD,提升小数据集泛化能力。 distillation
4 Localization-Aware Multi-Scale Representation Learning for Repetitive Action Counting 提出LMRL框架,通过定位感知多尺度表示学习提升重复动作计数精度。 representation learning
5 Skip Mamba Diffusion for Monocular 3D Semantic Scene Completion 提出Skip Mamba扩散模型,用于单目3D语义场景补全,显著提升性能。 Mamba
6 EdgeTAM: On-Device Track Anything Model 提出EdgeTAM,通过2D空间感知器加速SAM 2,实现移动端视频分割。 distillation foundation model
7 SAMKD: Spatial-aware Adaptive Masking Knowledge Distillation for Object Detection 提出空间感知自适应掩码知识蒸馏(SAMKD)框架,提升目标检测性能。 distillation
8 Representation Learning of Point Cloud Upsampling in Global and Local Inputs ReLPU:通过全局-局部特征学习提升点云上采样性能 representation learning
9 Rethinking Knowledge in Distillation: An In-context Sample Retrieval Perspective 提出IC-KD框架,通过上下文样本检索视角重新定义知识蒸馏,提升模型性能。 distillation
10 CSTA: Spatial-Temporal Causal Adaptive Learning for Exemplar-Free Video Class-Incremental Learning 提出CSTA框架,通过时空因果自适应学习解决免样本视频类增量学习问题 distillation spatiotemporal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (8 篇)

#题目一句话要点标签🔗
11 SplatMAP: Online Dense Monocular SLAM with 3D Gaussian Splatting SplatMAP:结合3D高斯溅射的在线稠密单目SLAM,提升重建质量。 3D gaussian splatting 3DGS gaussian splatting
12 Evaluating Human Perception of Novel View Synthesis: Subjective Quality Assessment of Gaussian Splatting and NeRF in Dynamic Scenes 针对动态场景,评估高斯溅射和NeRF新视角合成的人类感知质量 gaussian splatting splatting NeRF
13 Aligning First, Then Fusing: A Novel Weakly Supervised Multimodal Violence Detection Method 提出一种弱监督多模态暴力检测方法,通过模态对齐提升检测精度。 optical flow feature matching multimodal
14 Zero-Shot Scene Understanding for Automatic Target Recognition Using Large Vision-Language Models 结合开放世界检测器与视觉-语言模型,实现零样本自动目标识别 scene understanding
15 RePoseD: Efficient Relative Pose Estimation With Known Depth Information RePoseD:利用已知深度信息的高效相对位姿估计方法 depth estimation monocular depth
16 Matching-Free Depth Recovery from Structured Light 提出一种无匹配的结构光深度恢复方法,提升几何精度和训练速度。 depth estimation implicit representation
17 RMAvatar: Photorealistic Human Avatar Reconstruction from Monocular Video Based on Rectified Mesh-embedded Gaussians RMAvatar:基于校正网格嵌入高斯的单目视频逼真人像重建 gaussian splatting splatting
18 Hierarchical Superpixel Segmentation via Structural Information Theory 提出基于结构信息理论的层次化超像素分割方法以解决信息利用不足问题 scene understanding

🔬 支柱九:具身大模型 (Embodied Foundation Models) (2 篇)

#题目一句话要点标签🔗
19 LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models LEO:通过混合视觉编码器提升多模态大语言模型性能 large language model multimodal
20 Training-Free Motion-Guided Video Generation with Enhanced Temporal Consistency Using Motion Consistency Loss 提出基于运动一致性损失的免训练运动引导视频生成方法,提升时序一致性 foundation model

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
21 Collaborative Learning for 3D Hand-Object Reconstruction and Compositional Action Recognition from Egocentric RGB Videos Using Superquadrics 提出基于超二次曲面的协同学习框架,用于从第一视角RGB视频中进行3D手-物体重建和组合动作识别。 egocentric hand-object reconstruction
22 Testing Human-Hand Segmentation on In-Distribution and Out-of-Distribution Data in Human-Robot Interactions Using a Deep Ensemble Model 提出基于深度集成模型的人手分割方法,评估其在人机交互中分布内和分布外数据的性能。 egocentric

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
23 Robust Single Object Tracking in LiDAR Point Clouds under Adverse Weather Conditions 提出恶劣天气下LiDAR点云单目标跟踪基准与DRCT跟踪器,提升鲁棒性。 domain randomization contrastive learning
24 Guided SAM: Label-Efficient Part Segmentation Guided SAM:一种标签高效的零件分割方法,利用粗糙标注引导SAM进行精确分割。 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页