cs.CV（2025-05-31）

📊 共 15 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (4 🔗1) 支柱一：机器人控制 (Robot Control) (3 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (3 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (2) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱四：生成式动作 (Generative Motion) (1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
1	SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery	SatDreamer360：提出多视角一致的卫星图像到地面场景生成框架	dreamer height map
2	SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation	SenseFlow：通过缩放分布匹配实现Flow模型文本到图像的蒸馏	flow matching distillation	✅
3	From Local Cues to Global Percepts: Emergent Gestalt Organization in Self-Supervised Vision Models	研究表明，自监督视觉模型通过Gestalt原则涌现全局感知能力，并提出DiSRT测试基准。	MAE spatial relationship
4	CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning	CReFT-CAD：通过强化微调提升CAD正交投影推理能力	reinforcement learning instruction following

🔬 支柱一：机器人控制 (Robot Control) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
5	Multimodal Generative AI with Autoregressive LLMs for Human Motion Understanding and Generation: A Way Forward	综述：基于自回归LLM的多模态生成AI在人体运动理解与生成中的应用	humanoid text-to-motion motion synthesis
6	XYZ-IBD: A High-precision Bin-picking Dataset for Object 6D Pose Estimation Capturing Real-world Industrial Complexity	提出XYZ-IBD数据集，用于解决真实工业环境下物体6D位姿估计的难题。	manipulation depth estimation 6D pose estimation	✅
7	SEED: A Benchmark Dataset for Sequential Facial Attribute Editing with Diffusion Models	提出SEED数据集，用于评估扩散模型在人脸属性序列编辑中的性能，并提出FAITH模型。	manipulation	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Chain-of-Frames: Advancing Video Understanding in Multimodal LLMs via Frame-Aware Reasoning	提出Chain-of-Frames，通过帧感知推理提升多模态LLM的视频理解能力	large language model multimodal chain-of-thought	✅
9	HueManity: Probing Fine-Grained Visual Perception in MLLMs	HueManity：探究多模态大语言模型在细粒度视觉感知上的能力	large language model multimodal
10	Common Inpainted Objects In-N-Out of Context	提出COinCO数据集，用于提升模型对图像上下文一致性的理解和伪造检测能力。	large language model multimodal	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
11	Improving Optical Flow and Stereo Depth Estimation by Leveraging Uncertainty-Based Learning Difficulties	利用不确定性学习难度，提升光流和立体深度估计精度	depth estimation stereo depth optical flow
12	Test-time Vocabulary Adaptation for Language-driven Object Detection	提出VocAda，用于语言驱动目标检测的测试时词汇自适应，提升检测性能。	open-vocabulary open vocabulary

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
13	Sequence-Based Identification of First-Person Camera Wearers in Third-Person Views	提出基于序列的身份识别方法，用于在第三人称视角中识别第一人称相机佩戴者。	egocentric egocentric vision Ego4D

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Parallel Rescaling: Rebalancing Consistency Guidance for Personalized Diffusion Models	提出并行重缩放方法，提升个性化扩散模型在少量样本下的prompt对齐度与图像质量	classifier-free guidance

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Event-based multi-view photogrammetry for high-dynamic, high-velocity target measurement	提出基于事件相机的多视图摄影测量方法，用于高动态高速目标测量。	spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页