cs.CV(2025-07-31)

📊 共 38 篇论文 | 🔗 14 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (15 🔗6) 支柱九:具身大模型 (Embodied Foundation Models) (10 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (9 🔗2) 支柱一:机器人控制 (Robot Control) (3 🔗2) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (15 篇)

#题目一句话要点标签🔗
1 Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization 提出高斯溅射特征场(GSFFs),用于隐私保护的视觉定位。 representation learning 3D gaussian splatting 3DGS
2 UniLiP: Adapting CLIP for Unified Multimodal Understanding, Generation and Editing UniLIP:通过自蒸馏和双条件架构,使CLIP具备统一的多模态理解、生成和编辑能力。 distillation large language model multimodal
3 Multi-Modal Motion Retrieval by Learning a Fine-Grained Joint Embedding Space 提出一种多模态运动检索框架,通过学习细粒度联合嵌入空间提升检索性能。 contrastive learning text-to-motion motion generation
4 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding 3D-R1:通过增强3D视觉语言模型的推理能力实现统一场景理解 reinforcement learning RLHF scene understanding
5 Contrastive Learning-Driven Traffic Sign Perception: Multi-Modal Fusion of Text and Vision 提出基于对比学习的交通标志感知框架,融合文本与视觉信息,提升长尾分布下的识别精度。 contrastive learning open-vocabulary open vocabulary
6 Half-Physics: Enabling Kinematic 3D Human Model with Physical Interactions 提出Half-Physics机制,实现SMPL-X模型与环境的物理交互 reinforcement learning physically plausible penetration
7 FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning FastDriveVLA:提出基于重建的即插即用式Token剪枝,高效端到端自动驾驶。 MAE scene understanding vision-language-action
8 VMatcher: State-Space Semi-Dense Local Feature Matching VMatcher:结合Mamba和Transformer的状态空间半稠密局部特征匹配 Mamba SSM feature matching
9 FASTopoWM: Fast-Slow Lane Segment Topology Reasoning with Latent World Models FASTopoWM:利用潜在世界模型的快慢车道线拓扑推理 world model scene understanding
10 Mamba-based Efficient Spatio-Frequency Motion Perception for Video Camouflaged Object Detection 提出基于Mamba的时空频域运动感知网络Vcamba,用于高效视频伪装目标检测。 Mamba state space model
11 MamV2XCalib: V2X-based Target-less Infrastructure Camera Calibration with State Space Model 提出MamV2XCalib,一种基于V2X和状态空间模型的无目标基础设施相机标定方法 Mamba state space model
12 AGA: An adaptive group alignment framework for structured medical cross-modal representation learning 提出AGA框架,通过自适应分组对齐实现医学跨模态表征学习 representation learning contrastive learning
13 Slot Attention with Re-Initialization and Self-Distillation 提出DIAS以解决对象中心学习中的冗余和监督问题 distillation
14 Beyond Linear Bottlenecks: Spline-Based Knowledge Distillation for Culturally Diverse Art Style Classification 提出基于样条函数的知识蒸馏方法,提升文化艺术风格分类精度 distillation
15 Annotation-Free Reinforcement Learning Query Rewriting via Verifiable Search Reward 提出RL-QR,一种无需标注的强化学习查询重写框架,提升RAG系统检索性能。 reinforcement learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (10 篇)

#题目一句话要点标签🔗
16 Adversarial-Guided Diffusion for Multimodal LLM Attacks 提出对抗引导扩散(AGD)方法,提升多模态大语言模型对抗攻击的有效性和鲁棒性 large language model multimodal
17 On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI 提出选择性模态转移(SMS)方法,诊断多模态临床AI中文本偏差风险。 multimodal
18 Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval 提出BLiM框架,通过双向似然估计和先验归一化提升MLLM在文本视频检索中的性能。 large language model
19 LED Benchmark: Diagnosing Structural Layout Errors for Document Layout Analysis 提出LED基准,用于诊断文档布局分析中的结构布局错误 large language model multimodal
20 Punching Bag vs. Punching Person: Motion Transferability in Videos 提出运动迁移性评估框架,揭示动作识别模型在不同上下文泛化能力不足的问题 multimodal
21 A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition 提出质量引导的混合专家模型(QME),用于提升全身生物特征识别性能。 multimodal
22 Phi-Ground Tech Report: Advancing Perception in GUI Grounding Phi-Ground:提升GUI环境感知的计算机使用Agent的性能 multimodal
23 OmniTraj: Pre-Training on Heterogeneous Data for Adaptive and Zero-Shot Human Trajectory Prediction OmniTraj:通过异构数据预训练实现自适应和零样本的人类轨迹预测 zero-shot transfer
24 Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation 提出BeyondGloss,利用视频大语言模型实现无词汇手语翻译 large language model
25 Toward Safe, Trustworthy and Realistic Augmented Reality User Experience 提出ViDDAR和VIM-Sense,保障增强现实用户体验的安全性和可信度 multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
26 SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting SeqAffordSplat:基于3D高斯溅射的场景级序列可供性推理 3D gaussian splatting 3DGS gaussian splatting
27 NeRF Is a Valuable Assistant for 3D Gaussian Splatting NeRF-GS:融合NeRF与3DGS,提升三维场景重建性能 3D gaussian splatting 3DGS gaussian splatting
28 I2V-GS: Infrastructure-to-Vehicle View Transformation with Gaussian Splatting for Autonomous Driving Data Generation I2V-GS:利用高斯溅射进行基础设施到车辆视角转换,用于自动驾驶数据生成。 gaussian splatting splatting
29 Enhanced Velocity Field Modeling for Gaussian Video Reconstruction FlowGaussian-VR:提出基于光流的速度场建模方案,提升高动态视频的3D高斯重建质量。 3D gaussian splatting gaussian splatting splatting
30 MagicRoad: Semantic-Aware 3D Road Surface Reconstruction via Obstacle Inpainting MagicRoad:基于语义感知的障碍物修复三维道路表面重建 3D gaussian splatting 3DGS gaussian splatting
31 iLRM: An Iterative Large 3D Reconstruction Model 提出迭代式大型3D重建模型iLRM,解决现有方法在多视角高分辨率场景下的可扩展性问题。 3D gaussian splatting gaussian splatting splatting
32 Robust 3D Object Detection using Probabilistic Point Clouds from Single-Photon LiDARs 提出概率点云PPC,提升单光子激光雷达在复杂场景下的3D目标检测鲁棒性 scene understanding
33 World Consistency Score: A Unified Metric for Video Generation Quality 提出世界一致性评分(WCS),用于统一评估生成视频模型的世界一致性。 optical flow
34 MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion MonoFusion:通过单目融合实现稀疏视角下的4D动态重建 scene reconstruction

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
35 The Monado SLAM Dataset for Egocentric Visual-Inertial Tracking 发布Monado SLAM数据集,解决头戴设备VIO/SLAM在复杂场景下的性能瓶颈。 humanoid humanoid robot VIO
36 RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping RAGNet:构建大规模基于推理的抓取分割基准,提升通用抓取能力 manipulation affordance
37 Stable-Sim2Real: Exploring Simulation of Real-Captured 3D Data with Two-Stage Depth Diffusion 提出Stable-Sim2Real,通过两阶段深度扩散模型实现逼真的3D数据模拟。 sim2real

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
38 Hyperbolic Cycle Alignment for Infrared-Visible Image Fusion 提出基于双曲空间的红外-可见光图像配准网络Hy-CycleAlign,提升多模态图像融合效果。 geometric consistency

⬅️ 返回 cs.CV 首页 · 🏠 返回主页