cs.CV(2025-01-23)
📊 共 31 篇论文 | 🔗 8 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (10 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (9 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (5)
支柱一:机器人控制 (Robot Control) (3 🔗2)
支柱八:物理动画 (Physics-based Animation) (2 🔗1)
支柱七:动作重定向 (Motion Retargeting) (1 🔗1)
支柱四:生成式动作 (Generative Motion) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | Multi-aspect Knowledge Distillation with Large Language Model | 提出基于多模态大语言模型的多方面知识蒸馏方法,提升图像分类性能。 | distillation large language model multimodal | ||
| 2 | QMamba: Post-Training Quantization for Vision State Space Models | QMamba:面向视觉状态空间模型的后训练量化框架 | Mamba SSM state space model | ||
| 3 | MultiDreamer3D: Multi-concept 3D Customization with Concept-Aware Diffusion Guidance | MultiDreamer3D:提出概念感知扩散引导的多概念3D定制方法。 | dreamer 3D gaussian splatting gaussian splatting | ||
| 4 | Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step | 提出基于CoT的图像生成方法,通过验证和强化步骤显著提升自回归图像生成质量。 | DPO direct preference optimization chain-of-thought | ✅ | |
| 5 | MV-GMN: State Space Model for Multi-View Action Recognition | 提出MV-GMN模型,高效处理多视角动作识别中的多模态、多视角和多时序数据。 | Mamba state space model | ||
| 6 | Contrast: A Hybrid Architecture of Transformers and State Space Models for Low-Level Vision | 提出Contrast混合架构,融合Transformer与状态空间模型,提升图像超分辨率性能。 | Mamba state space model | ||
| 7 | Temporal Preference Optimization for Long-Form Video Understanding | 提出时间偏好优化(TPO)框架,提升视频大模型在长视频中的时间定位能力 | preference learning multimodal | ✅ | |
| 8 | Improving Video Generation with Human Feedback | 提出基于人类反馈的视频生成优化流程,解决运动不平滑和对齐问题。 | reinforcement learning DPO direct preference optimization | ||
| 9 | Retrievals Can Be Detrimental: A Contrastive Backdoor Attack Paradigm on Retrieval-Augmented Diffusion Models | 提出BadRDM,一种针对检索增强扩散模型的对比后门攻击方法,揭示RAG引入的安全隐患。 | contrastive learning multimodal | ||
| 10 | A Cognitive Paradigm Approach to Probe the Perception-Reasoning Interface in VLMs | 提出认知范式评估框架,解剖视觉语言模型中感知-推理的接口 | DRL HOI |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)
🔬 支柱三:空间感知与语义 (Perception & Semantics) (5 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 20 | PromptMono: Cross Prompting Attention for Self-Supervised Monocular Depth Estimation in Challenging Environments | PromptMono:利用跨Prompting注意力提升复杂环境下单目深度估计 | depth estimation monocular depth | ||
| 21 | GoDe: Gaussians on Demand for Progressive Level of Detail and Scalable Compression | 提出GoDe:基于按需高斯的渐进式细节层次和可扩展压缩方法 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 22 | GC-ConsFlow: Leveraging Optical Flow Residuals and Global Context for Robust Deepfake Detection | GC-ConsFlow:利用光流残差和全局上下文增强Deepfake检测鲁棒性 | optical flow spatiotemporal | ||
| 23 | Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos | Deblur-Avatar:从运动模糊单目视频重建可动画高保真3D人像 | 3D gaussian splatting 3DGS gaussian splatting | ||
| 24 | Symmetrization Weighted Binary Cross-Entropy: Modeling Perceptual Asymmetry for Human-Consistent Neural Edge Detection | 提出SWBCE损失函数以解决边缘检测中的感知不对称问题 | scene understanding |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 25 | LLM-guided Instance-level Image Manipulation with Diffusion U-Net Cross-Attention Maps | 提出LLM引导的实例级图像操控方法,利用扩散U-Net交叉注意力图实现精准编辑。 | manipulation open-vocabulary open vocabulary | ✅ | |
| 26 | Integrating Persian Lip Reading in Surena-V Humanoid Robot for Human-Robot Interaction | 将波斯语唇语识别集成到Surena-V机器人,提升人机交互能力 | humanoid humanoid robot | ||
| 27 | mmEgoHand: Egocentric Hand Pose Estimation and Gesture Recognition with Head-mounted Millimeter-wave Radar and IMU | 提出mmEgoHand,利用头戴毫米波雷达和IMU进行手部姿态估计和手势识别。 | teleoperation egocentric | ✅ |
🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 28 | EventVL: Understand Event Streams via Multimodal Large Language Model | 提出EventVL,首个生成式事件相机多模态大语言模型,用于显式语义理解。 | spatiotemporal large language model multimodal | ||
| 29 | Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization | 提出AEO框架,解决多模态开放集测试时自适应问题,提升未知类别样本区分能力。 | AMP multimodal | ✅ |
🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 30 | ME-CPT: Multi-Task Enhanced Cross-Temporal Point Transformer for Urban 3D Change Detection | 提出ME-CPT,用于城市三维变化检测,提升多时相点云语义变化特征提取能力。 | spatial relationship spatiotemporal | ✅ |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 31 | Implicit Neural Surface Deformation with Explicit Velocity Fields | 提出一种基于显式速度场的无监督神经隐式表面形变方法 | physically plausible |