cs.CV(2025-08-27)

📊 共 31 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (9 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (9 🔗4) 支柱二:RL算法与架构 (RL & Architecture) (9 🔗2) 支柱一:机器人控制 (Robot Control) (2) 支柱五:交互与反应 (Interaction & Reaction) (1) 支柱六:视频提取与匹配 (Video Extraction) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (9 篇)

#题目一句话要点标签🔗
1 MAPo : Motion-Aware Partitioning of Deformable 3D Gaussian Splatting for High-Fidelity Dynamic Scene Reconstruction 提出MAPo以解决动态场景重建中的模糊渲染问题 3D gaussian splatting 3DGS gaussian splatting
2 LabelGS: Label-Aware 3D Gaussian Splatting for 3D Scene Segmentation 提出LabelGS以解决3D场景分割能力不足问题 3D gaussian splatting 3DGS gaussian splatting
3 Seam360GS: Seamless 360° Gaussian Splatting from Real-World Omnidirectional Images 提出Seam360GS以解决360度图像渲染不完美问题 3D gaussian splatting gaussian splatting splatting
4 OpenM3D: Open Vocabulary Multi-view Indoor 3D Object Detection without Human Annotations 提出OpenM3D以解决无人工注释的多视角室内3D物体检测问题 open-vocabulary open vocabulary
5 Scalable Object Detection in the Car Interior With Vision Foundation Models 提出ODAL框架以解决车内物体检测与定位问题 scene understanding foundation model
6 FastAvatar: Towards Unified Fast High-Fidelity 3D Avatar Reconstruction with Large Gaussian Reconstruction Transformers 提出FastAvatar以解决高时间复杂度和数据利用率低的问题 3D gaussian splatting 3DGS gaussian splatting
7 Context-aware Sparse Spatiotemporal Learning for Event-based Vision 提出上下文感知稀疏时空学习以解决事件视觉处理问题 optical flow spatiotemporal
8 Q-Align: Alleviating Attention Leakage in Zero-Shot Appearance Transfer via Query-Query Alignment 提出Q-Align以解决零样本外观转移中的注意力泄漏问题 semantic mapping semantic map structure preservation
9 AutoQ-VIS: Improving Unsupervised Video Instance Segmentation via Automatic Quality Assessment 提出AutoQ-VIS以解决无监督视频实例分割中的质量评估问题 optical flow

🔬 支柱九:具身大模型 (Embodied Foundation Models) (9 篇)

#题目一句话要点标签🔗
10 How Multimodal LLMs Solve Image Tasks: A Lens on Visual Grounding, Task Reasoning, and Answer Decoding 提出多模态LLM分析框架以揭示视觉任务处理机制 large language model multimodal visual grounding
11 Grounding Multimodal Large Language Models with Quantitative Skin Attributes: A Retrieval Study 结合定量皮肤属性的多模态大语言模型以提升皮肤病诊断解释性 large language model multimodal
12 CVBench: Evaluating Cross-Video Synergies for Complex Multimodal Understanding and Reasoning 提出CVBench以解决多视频关系推理评估问题 large language model multimodal chain-of-thought
13 AudioStory: Generating Long-Form Narrative Audio with Large Language Models 提出AudioStory以解决长篇叙事音频生成问题 large language model instruction following
14 Multimodal Conditional MeshGAN for Personalized Aneurysm Growth Prediction 提出MCMeshGAN以解决个性化动脉瘤生长预测问题 multimodal
15 AIM: Adaptive Intra-Network Modulation for Balanced Multimodal Learning 提出自适应网络内调制以解决多模态学习不平衡问题 multimodal
16 Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents 提出大规模基准以评估智能手机代理的隐私意识 large language model multimodal
17 Integrating SAM Supervision for 3D Weakly Supervised Point Cloud Segmentation 提出结合SAM监督以解决3D弱监督点云分割问题 foundation model
18 The Return of Structural Handwritten Mathematical Expression Recognition 提出结构化手写数学表达式识别方法以解决符号对齐问题 large language model

🔬 支柱二:RL算法与架构 (RL & Architecture) (9 篇)

#题目一句话要点标签🔗
19 Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies 提出离散扩散VLA以解决视觉-语言-动作模型的统一性问题 transformer policy vision-language-action VLA
20 Mitigating Hallucinations in Multimodal LLMs via Object-aware Preference Optimization 提出CHAIR-DPO以减少多模态大语言模型的幻觉问题 DPO direct preference optimization large language model
21 Plug-in Feedback Self-adaptive Attention in CLIP for Training-free Open-Vocabulary Segmentation 提出反馈自适应注意力机制以解决CLIP的开放词汇分割问题 MAE open-vocabulary open vocabulary
22 Multimodal Prototype Alignment for Semi-supervised Pathology Image Segmentation 提出MPAMatch以解决病理图像分割中的模糊边界问题 contrastive learning foundation model multimodal
23 MotionFlux: Efficient Text-Guided Motion Generation through Rectified Flow Matching and Preference Alignment 提出MotionFlux以解决文本驱动运动生成的效率与精度问题 flow matching motion generation
24 Bridging Domain Gaps for Fine-Grained Moth Classification Through Expert-Informed Adaptation and Foundation Model Priors 提出轻量级分类方法以解决蛾类细粒度识别问题 distillation foundation model
25 CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning 提出CODA框架以解决科学计算中的自主代理执行问题 reinforcement learning generalist agent
26 ATMS-KD: Adaptive Temperature and Mixed Sample Knowledge Distillation for a Lightweight Residual CNN in Agricultural Embedded Systems 提出ATMS-KD框架以提升农业嵌入式系统中的轻量级CNN性能 distillation
27 Self-supervised structured object representation learning 提出自监督结构化物体表示学习以提升视觉理解能力 representation learning

🔬 支柱一:机器人控制 (Robot Control) (2 篇)

#题目一句话要点标签🔗
28 Ego-centric Predictive Model Conditioned on Hand Trajectories 提出统一的预测模型以解决人机交互中的动作与视觉结果建模问题 manipulation predictive model human-object interaction
29 Improving Generalization in Deepfake Detection with Face Foundation Models and Metric Learning 提出基于人脸基础模型与度量学习的深伪检测框架以提升泛化能力 manipulation foundation model

🔬 支柱五:交互与反应 (Interaction & Reaction) (1 篇)

#题目一句话要点标签🔗
30 Interact-Custom: Customized Human Object Interaction Image Generation 提出Interact-Custom以解决人机交互图像生成中的身份与交互控制问题 human-object interaction HOI CHOIS

🔬 支柱六:视频提取与匹配 (Video Extraction) (1 篇)

#题目一句话要点标签🔗
31 SAT: Supervisor Regularization and Animation Augmentation for Two-process Monocular Texture 3D Human Reconstruction 提出SAT框架以解决单目纹理3D人类重建中的几何模糊问题 SMPL

⬅️ 返回 cs.CV 首页 · 🏠 返回主页