cs.CV(2025-06-30)

📊 共 43 篇论文 | 🔗 16 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (13 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (12 🔗4) 支柱九:具身大模型 (Embodied Foundation Models) (11 🔗5) 支柱一:机器人控制 (Robot Control) (3 🔗1) 支柱四:生成式动作 (Generative Motion) (1) 支柱七:动作重定向 (Motion Retargeting) (1) 支柱八:物理动画 (Physics-based Animation) (1 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (13 篇)

#题目一句话要点标签🔗
1 VOCAL: Visual Odometry via ContrAstive Learning 提出VOCAL框架以解决视觉里程计的可解释性问题 representation learning contrastive learning visual odometry
2 Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking 提出Mamba-FETrack V2以解决多模态视觉目标跟踪效率问题 Mamba state space model multimodal
3 JAM-Flow: Joint Audio-Motion Synthesis with Flow Matching 提出JAM-Flow以解决音频与面部动作合成问题 flow matching motion synthesis
4 Embedding-based Retrieval in Multimodal Content Moderation 提出嵌入式检索方法以解决短视频内容审核效率问题 contrastive learning multimodal
5 Towards foundational LiDAR world models with efficient latent flow matching 提出基于潜在条件流匹配的LiDAR世界模型以解决领域迁移问题 flow matching world model
6 Dataset Distillation via Vision-Language Category Prototype 提出视觉-语言类别原型的蒸馏方法以提升数据集蒸馏性能 distillation large language model
7 LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching 提出LLM增强的动作感知多模态提示调优以解决图像-文本匹配问题 representation learning spatial relationship large language model
8 NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments 提出NavMorph以解决视觉语言导航中的环境适应问题 world model VLN
9 CS-VLM: Compressed Sensing Attention for Efficient Vision-Language Representation Learning 提出压缩感知注意力机制以解决视觉语言模型的计算瓶颈问题 representation learning multimodal
10 Room Scene Discovery and Grouping in Unstructured Vacation Rental Image Collections 提出房间场景发现与分组方法以解决度假租赁图像无结构问题 contrastive learning large language model
11 FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation 提出FADRM以解决数据蒸馏中的信息消失问题 distillation
12 From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection 提出基于眼动追踪的弱监督视频显著目标检测方法 contrastive learning spatiotemporal
13 When Test-Time Adaptation Meets Self-Supervised Models 提出自监督测试时适应协议以提升模型性能 contrastive learning distillation

🔬 支柱三:空间感知与语义 (Perception & Semantics) (12 篇)

#题目一句话要点标签🔗
14 AttentionGS: Towards Initialization-Free 3D Gaussian Splatting via Structural Attention 提出AttentionGS以解决3D重建中对高质量点云的依赖问题 3D gaussian splatting 3DGS gaussian splatting
15 PGOV3D: Open-Vocabulary 3D Semantic Segmentation with Partial-to-Global Curriculum 提出PGOV3D以解决开放词汇3D语义分割中的信息转移问题 open-vocabulary open vocabulary large language model
16 MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction 提出MILo框架以解决高质量3D表面重建问题 gaussian splatting splatting
17 Can We Challenge Open-Vocabulary Object Detectors with Generated Content in Street Scenes? 利用生成内容挑战开放词汇物体检测器的局限性 open-vocabulary open vocabulary
18 Instant GaussianImage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting 提出自适应高斯图像表示框架以解决训练效率低下问题 gaussian splatting splatting
19 Diffusion-Based Image Augmentation for Semantic Segmentation in Outdoor Robotics 提出基于扩散的图像增强方法以解决户外机器人语义分割问题 open-vocabulary open vocabulary foundation model
20 TextMesh4D: Text-to-4D Mesh Generation via Jacobian Deformation Field 提出TextMesh4D以解决动态3D网格生成问题 3DGS NeRF spatiotemporal
21 PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View 提出PriOr-Flow以解决全景光流估计中的极区失真问题 optical flow
22 Proteus-ID: ID-Consistent and Motion-Coherent Video Customization 提出Proteus-ID以解决视频身份一致性与运动连贯性问题 optical flow multimodal
23 Computer Vision for Objects used in Group Work: Challenges and Opportunities 提出FiboSB数据集以解决协作任务中的6D姿态估计问题 6D pose estimation
24 C3VDv2 -- Colonoscopy 3D video dataset with enhanced realism 提出C3VDv2数据集以解决3D结肠镜重建算法训练不足问题 optical flow
25 SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning 提出SCORP以解决场景重建中对象视角缺失问题 scene reconstruction

🔬 支柱九:具身大模型 (Embodied Foundation Models) (11 篇)

#题目一句话要点标签🔗
26 A Survey on Vision-Language-Action Models for Autonomous Driving 综述视觉-语言-动作模型以推动自动驾驶技术发展 vision-language-action VLA large language model
27 Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks 评估多模态大语言模型的地理定位能力以应对隐私风险 large language model multimodal
28 Unified Multimodal Understanding via Byte-Pair Visual Encoding 提出统一多模态理解框架以解决模态对齐问题 large language model foundation model multimodal
29 Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data 提出Zenesis以解决科学图像零-shot分割问题 foundation model multimodal
30 DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World 提出DenseWorld-1M以解决现有图像描述数据集缺乏细节的问题 large language model multimodal visual grounding
31 Towards an Automated Multimodal Approach for Video Summarization: Building a Bridge Between Text, Audio and Facial Cue-Based Summarization 提出一种多模态视频摘要方法以提升视频内容理解 multimodal
32 Flash-VStream: Efficient Real-Time Understanding for Long Video Streams 提出Flash-VStream以解决长视频理解的效率问题 large language model multimodal
33 On the Domain Robustness of Contrastive Vision-Language Models 提出Deepbench框架以评估视觉-语言模型的领域鲁棒性 large language model foundation model
34 VAP-Diffusion: Enriching Descriptions with MLLMs for Enhanced Medical Image Generation 提出VAP-Diffusion以解决医学图像生成中的描述不足问题 large language model chain-of-thought
35 Learning Frequency and Memory-Aware Prompts for Multi-Modal Object Tracking 提出频率与记忆感知提示以解决多模态目标跟踪问题 foundation model
36 AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval 提出SynLecSlideGen以解决讲义幻灯片元素检测与检索问题 large language model

🔬 支柱一:机器人控制 (Robot Control) (3 篇)

#题目一句话要点标签🔗
37 Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers 提出以图像思维推动多模态推理的框架以解决语义差距问题 manipulation multimodal chain-of-thought
38 Epona: Autoregressive Diffusion World Model for Autonomous Driving 提出Epona以解决自主驾驶中的长时序预测问题 motion planning world model spatiotemporal
39 SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion 提出SG-LDM以解决激光雷达点云生成问题 manipulation

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
40 MotionGPT3: Human Motion as a Second Modality 提出MotionGPT3以解决多模态运动理解与生成问题 motion generation MotionGPT large language model

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
41 A Unified Framework for Stealthy Adversarial Generation via Latent Optimization and Transferability Enhancement 提出统一框架以解决扩散模型对抗样本生成的转移性问题 latent optimization

🔬 支柱八:物理动画 (Physics-based Animation) (1 篇)

#题目一句话要点标签🔗
42 Ella: Embodied Social Agents with Lifelong Memory 提出Ella以解决社交智能体的终身学习问题 spatiotemporal foundation model multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
43 RGC-VQA: An Exploration Database for Robotic-Generated Video Quality Assessment 提出RGC-VQA以解决机器人生成视频质量评估问题 egocentric

⬅️ 返回 cs.CV 首页 · 🏠 返回主页