cs.CV(2025-05-31)
📊 共 15 篇论文 | 🔗 5 篇有代码
🎯 兴趣领域导航
支柱二:RL算法与架构 (RL & Architecture) (4 🔗1)
支柱一:机器人控制 (Robot Control) (3 🔗2)
支柱九:具身大模型 (Embodied Foundation Models) (3 🔗2)
支柱三:空间感知与语义 (Perception & Semantics) (2)
支柱六:视频提取与匹配 (Video Extraction) (1)
支柱四:生成式动作 (Generative Motion) (1)
支柱八:物理动画 (Physics-based Animation) (1)
🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 1 | SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery | SatDreamer360:提出多视角一致的卫星图像到地面场景生成框架 | dreamer height map | ||
| 2 | SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation | SenseFlow:通过缩放分布匹配实现Flow模型文本到图像的蒸馏 | flow matching distillation | ✅ | |
| 3 | From Local Cues to Global Percepts: Emergent Gestalt Organization in Self-Supervised Vision Models | 研究表明,自监督视觉模型通过Gestalt原则涌现全局感知能力,并提出DiSRT测试基准。 | MAE spatial relationship | ||
| 4 | CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning | CReFT-CAD:通过强化微调提升CAD正交投影推理能力 | reinforcement learning instruction following |
🔬 支柱一:机器人控制 (Robot Control) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 5 | Multimodal Generative AI with Autoregressive LLMs for Human Motion Understanding and Generation: A Way Forward | 综述:基于自回归LLM的多模态生成AI在人体运动理解与生成中的应用 | humanoid text-to-motion motion synthesis | ||
| 6 | XYZ-IBD: A High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity | 提出XYZ-IBD数据集,用于解决真实工业环境下物体6D位姿估计的难题。 | manipulation depth estimation 6D pose estimation | ✅ | |
| 7 | SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models | 提出SEED数据集,用于评估扩散模型在人脸属性序列编辑中的性能,并提出FAITH模型。 | manipulation | ✅ |
🔬 支柱九:具身大模型 (Embodied Foundation Models) (3 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 8 | Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning | 提出Chain-of-Frames,通过帧感知推理提升多模态LLM的视频理解能力 | large language model multimodal chain-of-thought | ✅ | |
| 9 | HueManity: Probing Fine-Grained Visual Perception in MLLMs | HueManity:探究多模态大语言模型在细粒度视觉感知上的能力 | large language model multimodal | ||
| 10 | Common Inpainted Objects In-N-Out of Context | 提出COinCO数据集,用于提升模型对图像上下文一致性的理解和伪造检测能力。 | large language model multimodal | ✅ |
🔬 支柱三:空间感知与语义 (Perception & Semantics) (2 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 11 | Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties | 利用不确定性学习难度,提升光流和立体深度估计精度 | depth estimation stereo depth optical flow | ||
| 12 | Test-time Vocabulary Adaptation for Language-driven Object Detection | 提出VocAda,用于语言驱动目标检测的测试时词汇自适应,提升检测性能。 | open-vocabulary open vocabulary |
🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 13 | Sequence-Based Identification of First-Person Camera Wearers in Third-Person Views | 提出基于序列的身份识别方法,用于在第三人称视角中识别第一人称相机佩戴者。 | egocentric egocentric vision Ego4D |
🔬 支柱四:生成式动作 (Generative Motion) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 14 | Parallel Rescaling: Rebalancing Consistency Guidance for Personalized Diffusion Models | 提出并行重缩放方法,提升个性化扩散模型在少量样本下的prompt对齐度与图像质量 | classifier-free guidance |
🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)
| # | 题目 | 一句话要点 | 标签 | 🔗 | ⭐ |
|---|---|---|---|---|---|
| 15 | Event-based multi-view photogrammetry for high-dynamic, high-velocity target measurement | 提出基于事件相机的多视图摄影测量方法,用于高动态高速目标测量。 | spatiotemporal |