cs.CV（2025-01-24）

📊 共 17 篇论文 | 🔗 3 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (7 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (5) 支柱二：RL算法与架构 (RL & Architecture) (4 🔗2) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Trick-GS: A Balanced Bag of Tricks for Efficient Gaussian Splatting	Trick-GS：面向资源受限设备的高效高斯溅射方法	gaussian splatting splatting
2	Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images	提出基于小波变换高斯溅射的微宏观方法，用于从无约束图像中进行3D重建。	gaussian splatting splatting
3	Scene Understanding Enabled Semantic Communication with Open Channel Coding	提出OpenSC：结合场景理解、LLM与开放信道编码的语义通信系统	scene understanding large language model
4	Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video	提出CorrGS，通过噪声视频进行鲁棒的自运动估计和3D重建。	gaussian splatting splatting
5	Dense-SfM: Structure from Motion with Dense Consistent Matching	Dense-SfM：结合稠密匹配与高斯溅射的精确三维重建框架	gaussian splatting splatting	✅
6	SyncAnimation: A Real-Time End-to-End Framework for Audio-Driven Human Pose and Talking Head Animation	SyncAnimation：首个基于NeRF的实时端到端音频驱动人脸和全身动画框架	NeRF
7	Rethinking Encoder-Decoder Flow Through Shared Structures	提出共享结构“banks”增强解码器，提升Transformer深度估计性能	depth estimation

🔬 支柱九：具身大模型 (Embodied Foundation Models) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Leveraging ChatGPT's Multimodal Vision Capabilities to Rank Satellite Images by Poverty Level: Advancing Tools for Social Science Research	利用ChatGPT多模态视觉能力，通过卫星图像评估贫困程度，推进社会科学研究工具。	large language model multimodal
9	Triple Path Enhanced Neural Architecture Search for Multimodal Fake News Detection	提出MUSE模型，通过三路径增强神经架构搜索解决多模态假新闻检测问题。	multimodal
10	Measuring and Mitigating Hallucinations in Vision-Language Dataset Generation for Remote Sensing	提出融合地图信息的遥感图像-文本数据集生成方法，缓解幻觉问题。	large language model multimodal
11	Global Semantic-Guided Sub-image Feature Weight Allocation in High-Resolution Large Vision-Language Models	提出GSWA模块，为高分辨率LVLM中的子图像动态分配语义权重，提升视觉理解能力。	multimodal
12	Dynamic Token Reduction during Generation for Vision Language Models	提出动态速率（DyRate）方法，解决视觉语言模型生成过程中视觉token冗余问题。	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (4 篇)

#	题目	一句话要点	标签	🔗	⭐
13	HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation	HERMES：用于同步3D场景理解与生成的统一自动驾驶世界模型	world model scene understanding large language model	✅
14	Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation	提出基于Jaccard距离条件对比学习和上下文视觉增强的多模态实体链接方法	contrastive learning multimodal
15	Surface Vision Mamba: Leveraging Bidirectional State Space Model for Efficient Spherical Manifold Representation	提出Surface Vision Mamba，用于高效球面流形表示和神经发育表型回归。	Mamba state space model	✅
16	Dreamweaver: Learning Compositional World Models from Pixels	Dreamweaver：提出一种从像素学习组合世界模型的方法，用于视频分解和未来预测。	world model

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations	提出ReferDINO以解决视频目标分割中的视觉引导问题	spatiotemporal visual grounding

⬅️ 返回 cs.CV 首页 · 🏠 返回主页