cs.CV(2025-08-05)

📊 共 47 篇论文 | 🔗 12 篇有代码

🎯 兴趣领域导航

支柱九:具身大模型 (Embodied Foundation Models) (16 🔗4) 支柱三:空间感知与语义 (Perception & Semantics) (10 🔗3) 支柱二:RL算法与架构 (RL & Architecture) (10 🔗3) 支柱六:视频提取与匹配 (Video Extraction) (3) 支柱七:动作重定向 (Motion Retargeting) (2 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱四:生成式动作 (Generative Motion) (2) 支柱八:物理动画 (Physics-based Animation) (2)

🔬 支柱九:具身大模型 (Embodied Foundation Models) (16 篇)

#题目一句话要点标签🔗
1 LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation 提出LongVie以解决超长视频生成中的可控性与一致性问题 multimodal
2 SAM2-UNeXT: An Improved High-Resolution Baseline for Adapting Foundation Models to Downstream Segmentation Tasks 提出SAM2-UNeXT以提升基础模型在下游分割任务中的表现 foundation model
3 Quality-Aware Language-Conditioned Local Auto-Regressive Anomaly Synthesis and Detection 提出ARAS方法以解决现有异常合成的结构缺陷问题 language conditioned
4 Semantic Mosaicing of Histo-Pathology Image Fragments using Visual Foundation Models 提出SemanticStitcher以解决组织病理图像拼接问题 foundation model
5 MedCAL-Bench: A Comprehensive Benchmark on Cold-Start Active Learning with Foundation Models for Medical Image Analysis 提出MedCAL-Bench以解决医疗图像分析中的冷启动主动学习问题 foundation model
6 Zero-shot Shape Classification of Nanoparticles in SEM Images using Vision Foundation Models 提出零-shot分类方法以解决纳米颗粒形态识别问题 foundation model
7 Beyond Meme Templates: Limitations of Visual Similarity Measures in Meme Matching 提出超越模板匹配的视觉相似性度量以解决表情包匹配问题 large language model multimodal
8 CoEmoGen: Towards Semantically-Coherent and Scalable Emotional Image Content Generation 提出CoEmoGen以解决情感图像生成中的语义不一致问题 large language model multimodal
9 R2GenKG: Hierarchical Multi-modal Knowledge Graph for LLM-based Radiology Report Generation 提出R2GenKG以解决X光报告生成中的幻觉与诊断能力不足问题 large language model foundation model
10 Less is More: Token-Efficient Video-QA via Adaptive Frame-Pruning and Semantic Graph Integration 提出自适应帧剪枝与语义图集成以解决视频问答中的冗余问题 large language model multimodal
11 Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts 提出利用LLM生成视觉概念以增强疾病持续学习 large language model multimodal
12 Enhancing Long Video Question Answering with Scene-Localized Frame Grouping 提出SLFG方法以解决长视频问答中的信息提取问题 large language model multimodal
13 ParticleSAM: Small Particle Segmentation for Material Quality Monitoring in Recycling Processes 提出ParticleSAM以解决建筑材料回收中小颗粒分割问题 foundation model
14 VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation 提出VLMQ以解决视觉语言模型的后训练量化问题 large language model
15 Bias Beyond Demographics: Probing Decision Boundaries in Black-Box LVLMs via Counterfactual VQA 提出反事实视觉问答基准以审计黑箱LVLM的决策偏差 multimodal
16 Multi-Granularity Feature Calibration via VFM for Domain Generalized Semantic Segmentation 提出多粒度特征校准方法以解决领域泛化语义分割问题 foundation model

🔬 支柱三:空间感知与语义 (Perception & Semantics) (10 篇)

#题目一句话要点标签🔗
17 SA-3DGS: A Self-Adaptive Compression Method for 3D Gaussian Splatting 提出SA-3DGS以解决3D高斯点压缩问题 3D gaussian splatting 3DGS gaussian splatting
18 Open-Vocabulary HOI Detection with Interaction-aware Prompt and Concept Calibration 提出INP-CC以解决开放词汇HOI检测中的交互识别问题 open-vocabulary open vocabulary human-object interaction
19 RobustGS: Unified Boosting of Feedforward 3D Gaussian Splatting under Low-Quality Conditions 提出RobustGS以解决低质量条件下3D重建问题 3D gaussian splatting 3DGS gaussian splatting
20 Uni3R: Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images 提出Uni3R以解决无姿态多视图图像的3D重建与语义理解问题 gaussian splatting splatting scene reconstruction
21 Duplex-GS: Proxy-Guided Weighted Blending for Real-Time Order-Independent Gaussian Splatting 提出Duplex-GS以解决实时高效的高斯渲染问题 3D gaussian splatting 3DGS gaussian splatting
22 Monocular Depth Estimation with Global-Aware Discretization and Local Context Modeling 提出Gated Large Kernel Attention Module以解决单目深度估计问题 depth estimation monocular depth
23 H3R: Hybrid Multi-view Correspondence for Generalizable 3D Reconstruction 提出H3R框架以解决多视角对应建模的挑战 3D gaussian splatting gaussian splatting splatting
24 Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing 提出Gaussian实例追踪以解决2D到3D分割不一致问题 gaussian splatting splatting
25 Video Demoireing using Focused-Defocused Dual-Camera System 提出双摄像头系统以解决视频中的摩尔纹问题 optical flow
26 CHARM: Collaborative Harmonization across Arbitrary Modalities for Modality-agnostic Semantic Segmentation 提出CHARM以解决多模态语义分割中的同质化问题 scene understanding

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
27 FedPromo: Federated Lightweight Proxy Models at the Edge Bring New Domains to Foundation Models 提出FedPromo以解决边缘设备资源不足问题 distillation foundation model
28 AlignCAT: Visual-Linguistic Alignment of Category and Attribute for Weakly Supervised Visual Grounding 提出AlignCAT以解决弱监督视觉定位中的语义对齐问题 contrastive learning visual grounding
29 Nexus-INR: Diverse Knowledge-guided Arbitrary-Scale Multimodal Medical Image Super-Resolution 提出Nexus-INR以解决多模态医学图像的任意尺度超分辨率问题 distillation multimodal
30 AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video 提出AVATAR以解决多模态视频推理中的数据效率和信用分配问题 reinforcement learning spatiotemporal multimodal
31 BaroPoser: Real-time Human Motion Tracking from IMUs and Barometers in Everyday Devices 提出BaroPoser以解决不平坦地形下人类动作追踪问题 representation learning motion tracking
32 V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models 提出V.I.P.框架以解决视频扩散模型的高计算成本问题 DPO VIP distillation
33 LiDARCrafter: Dynamic 4D World Modeling from LiDAR Sequences 提出LiDARCrafter以解决动态4D世界建模问题 world model occupancy grid
34 RAVID: Retrieval-Augmented Visual Detection: A Knowledge-Driven Approach for AI-Generated Image Identification 提出RAVID框架以解决AI生成图像检测问题 representation learning foundation model
35 Architectural Insights into Knowledge Distillation for Object Detection: A Comprehensive Review 提出基于架构的知识蒸馏方法以解决目标检测中的挑战 distillation
36 Causal Disentanglement and Cross-Modal Alignment for Enhanced Few-Shot Learning 提出Causal CLIP Adapter以解决少样本学习中的表示纠缠问题 contrastive learning multimodal

🔬 支柱六:视频提取与匹配 (Video Extraction) (3 篇)

#题目一句话要点标签🔗
37 EgoPrompt: Prompt Learning for Egocentric Action Recognition 提出EgoPrompt以解决第一人称动作识别中的语义关系问题 egocentric Ego4D
38 WaMo: Wavelet-Enhanced Multi-Frequency Trajectory Analysis for Fine-Grained Text-Motion Retrieval 提出WaMo框架以解决文本与3D动作序列匹配问题 motion retrieval
39 COFFEE: A Shadow-Resilient Real-Time Pose Estimator for Unknown Tumbling Asteroids using Sparse Neural Networks 提出COFFEE以解决未知翻滚小行星的实时姿态估计问题 feature matching

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
40 LORE: Latent Optimization for Precise Semantic Control in Rectified Flow-based Image Editing 提出LORE以解决图像编辑中的语义控制问题 latent optimization
41 MILD: Multi-Layer Diffusion Strategy for Complex and Precise Multi-IP Aware Human Erasing 提出多层扩散策略以解决复杂场景下的人物抹除问题 spatial relationship

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
42 ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow 提出ActionSink以解决机器人操作精度不足问题 manipulation optical flow
43 DepthGait: Multi-Scale Cross-Level Feature Fusion of RGB-Derived Depth and Silhouette Sequences for Robust Gait Recognition 提出DepthGait以解决步态识别中的模态融合问题 locomotion

🔬 支柱四:生成式动作 (Generative Motion) (2 篇)

#题目一句话要点标签🔗
44 RAAG: Ratio Aware Adaptive Guidance 提出自适应引导方法以解决流式生成模型采样不稳定问题 classifier-free guidance
45 Diffusion Models with Adaptive Negative Sampling Without External Resources 提出自适应负采样方法以提升扩散模型的图像生成质量 classifier-free guidance

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
46 Fast Magnetic Resonance Simulation Using Combined Update with Grouped Isochromats 提出基于分组等磁体的快速磁共振模拟方法以解决计算时间问题 PULSE
47 MoCA: Identity-Preserving Text-to-Video Generation via Mixture of Cross Attention 提出MoCA以解决文本到视频生成中的身份保持问题 spatiotemporal

⬅️ 返回 cs.CV 首页 · 🏠 返回主页