cs.CV(2025-08-19)

📊 共 31 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱二:RL算法与架构 (RL & Architecture) (10 🔗5) 支柱九:具身大模型 (Embodied Foundation Models) (7) 支柱三:空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱七:动作重定向 (Motion Retargeting) (2) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱八:物理动画 (Physics-based Animation) (2) 支柱四:生成式动作 (Generative Motion) (1) 支柱一:机器人控制 (Robot Control) (1)

🔬 支柱二:RL算法与架构 (RL & Architecture) (10 篇)

#题目一句话要点标签🔗
1 GALA: Guided Attention with Language Alignment for Open Vocabulary Gaussian Splatting 提出GALA框架以解决开放词汇3D场景理解问题 contrastive learning 3D gaussian splatting 3DGS
2 Distilled-3DGS:Distilled 3D Gaussian Splatting 提出蒸馏3D高斯点云以解决高保真渲染的存储问题 distillation 3D gaussian splatting 3DGS
3 PhysGM: Large Physical Gaussian Model for Feed-Forward 4D Synthesis 提出PhysGM以解决物理基础4D合成中的效率与准确性问题 DPO direct preference optimization distillation
4 Pixels to Play: A Foundation Model for 3D Gameplay 提出Pixels2Play-0.1以解决3D游戏智能体行为生成问题 behavior cloning foundation model
5 Structured Prompting and Multi-Agent Knowledge Distillation for Traffic Video Interpretation and Risk Inference 提出结构化提示与多智能体知识蒸馏以解决交通视频理解问题 distillation scene understanding chain-of-thought
6 Diversity-enhanced Collaborative Mamba for Semi-supervised Medical Image Segmentation 提出Diversity-enhanced Collaborative Mamba以解决半监督医学图像分割问题 Mamba SSM state space model
7 Towards Efficient Vision State Space Models via Token Merging 提出MaMe以解决SSM模型计算效率问题 SSM state space model
8 LENS: Learning to Segment Anything with Unified Reinforced Reasoning 提出LENS框架以解决文本提示图像分割中的推理不足问题 reinforcement learning chain-of-thought
9 Backdooring Self-Supervised Contrastive Learning by Noisy Alignment 提出噪声对齐方法以解决自监督对比学习中的后门攻击问题 contrastive learning
10 Multi-view Clustering via Bi-level Decoupling and Consistency Learning 提出双层解耦与一致性学习框架以提升多视角聚类效果 representation learning contrastive learning

🔬 支柱九:具身大模型 (Embodied Foundation Models) (7 篇)

#题目一句话要点标签🔗
11 PersonaVlog: Personalized Multimodal Vlog Generation with Multi-Agent Collaboration and Iterative Self-Correction 提出PersonaVlog以解决个性化短视频生成问题 large language model multimodal
12 A Fully Transformer Based Multimodal Framework for Explainable Cancer Image Segmentation Using Radiology Reports 提出Med-CTX以解决乳腺癌超声图像分割的可解释性问题 multimodal
13 Directed-Tokens: A Robust Multi-Modality Alignment Approach to Large Language-Vision Models 提出定向标记以解决多模态对齐问题 multimodal instruction following
14 MMIS-Net for Retinal Fluid Segmentation and Detection 提出MMIS-Net以解决视网膜液体分割与检测问题 foundation model multimodal
15 Enhancing Targeted Adversarial Attacks on Large Vision-Language Models via Intermediate Projector 提出中间投影器以增强针对大型视觉-语言模型的对抗攻击 large language model multimodal
16 HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes 提出HumanPCR以评估多模态模型在复杂人类场景中的能力 multimodal chain-of-thought
17 Revisiting MLLM Token Technology through the Lens of Classical Visual Coding 通过经典视觉编码重新审视MLLM令牌技术以提升信息传递效率 large language model multimodal

🔬 支柱三:空间感知与语义 (Perception & Semantics) (6 篇)

#题目一句话要点标签🔗
18 Online 3D Gaussian Splatting Modeling with Novel View Selection 提出在线3D高斯点云建模方法以解决场景重建不完整问题 3D gaussian splatting 3DGS gaussian splatting
19 LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos 提出LongSplat以解决长视频中的视角合成问题 3D gaussian splatting gaussian splatting splatting
20 ROVR-Open-Dataset: A Large-Scale Depth Dataset for Autonomous Driving 提出ROVR数据集以解决深度估计多样性不足问题 depth estimation monocular depth scene understanding
21 EAvatar: Expression-Aware Head Avatar Reconstruction with Generative Geometry Priors 提出EAvatar以解决高保真头部虚拟形象重建中的表情捕捉问题 3D gaussian splatting 3DGS gaussian splatting
22 MR6D: Benchmarking 6D Pose Estimation for Mobile Robots 提出MR6D数据集以解决移动机器人6D姿态估计问题 6D pose estimation
23 MF-LPR$^2$: Multi-Frame License Plate Image Restoration and Recognition using Optical Flow 提出MF-LPR$^2$以解决低质量车牌图像恢复与识别问题 optical flow

🔬 支柱七:动作重定向 (Motion Retargeting) (2 篇)

#题目一句话要点标签🔗
24 RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation 提出RotBench以评估多模态大语言模型的图像旋转识别能力 spatial relationship large language model multimodal
25 Calibrating Biased Distribution in VFM-derived Latent Space via Cross-Domain Geometric Consistency 提出几何知识引导的分布校准方法以解决样本偏差问题 geometric consistency foundation model

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
26 RynnEC: Bringing MLLMs into Embodied World 提出RynnEC以解决多模态大语言模型在具身认知中的应用问题 egocentric large language model foundation model
27 Self-Supervised Sparse Sensor Fusion for Long Range Perception 提出自监督稀疏传感器融合以解决长距离感知问题 sparse sensors

🔬 支柱八:物理动画 (Physics-based Animation) (2 篇)

#题目一句话要点标签🔗
28 UNICON: UNIfied CONtinual Learning for Medical Foundational Models 提出UNICON框架以解决医学基础模型的持续学习问题 UniCon foundation model
29 FAMNet: Integrating 2D and 3D Features for Micro-expression Recognition via Multi-task Learning and Hierarchical Attention 提出FAMNet以解决微表情识别中的特征提取挑战 spatiotemporal

🔬 支柱四:生成式动作 (Generative Motion) (1 篇)

#题目一句话要点标签🔗
30 VisionLaw: Inferring Interpretable Intrinsic Dynamics from Visual Observations via Bilevel Optimization 提出VisionLaw以解决物体内在动力学推断问题 physically plausible

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
31 EDTalk++: Full Disentanglement for Controllable Talking Head Synthesis 提出EDTalk++以解决可控人头合成中的特征解耦问题 manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页