cs.CV（2025-01-21）

📊 共 25 篇论文 | 🔗 4 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (9 🔗1) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (7 🔗2) 支柱七：动作重定向 (Motion Retargeting) (1) 支柱四：生成式动作 (Generative Motion) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
1	HAC++: Towards 100X Compression of 3D Gaussian Splatting	HAC++：实现3D高斯溅射100倍压缩，提升渲染保真度	3D gaussian splatting 3DGS gaussian splatting	✅
2	Survey on Monocular Metric Depth Estimation	提出单目度量深度估计以解决深度预测一致性问题	visual SLAM depth estimation monocular depth
3	Video Depth Anything: Consistent Depth Estimation for Super-Long Videos	Video Depth Anything：为超长视频提供一致性深度估计	depth estimation monocular depth Depth Anything
4	Towards Affordance-Aware Articulation Synthesis for Rigged Objects	提出A3Syn，解决开放域绑定物体的具身姿态自动合成问题	affordance affordance-aware
5	GSVC: Efficient Video Representation and Compression Through 2D Gaussian Splatting	提出GSVC以通过2D高斯点云高效表示和压缩视频	gaussian splatting splatting
6	Fast Underwater Scene Reconstruction using Multi-View Stereo and Physical Imaging	提出基于物理成像的水下多视图立体快速重建方法，提升重建质量与效率。	depth estimation NeRF neural radiance field
7	DARB-Splatting: Generalizing Splatting with Decaying Anisotropic Radial Basis Functions	提出基于衰减各向异性径向基函数（DARB）的Splatting方法，加速训练并降低内存消耗。	3D gaussian splatting gaussian splatting splatting
8	Learning segmentation from point trajectories	利用点轨迹学习视频分割，无需额外监督信息。	optical flow
9	Continuous 3D Perception Model with Persistent State	提出CUT3R，利用持续状态的循环模型解决连续3D感知任务。	scene reconstruction

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
10	VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model	VARGPT：视觉自回归多模态大语言模型，统一理解与生成任务	large language model multimodal instruction following
11	EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents	EmbodiedEval：提出一个综合性的交互式基准，用于评估多模态LLM在具身智能任务中的表现。	embodied AI large language model multimodal	✅
12	InsTALL: Context-aware Instructional Task Assistance with Multi-modal Large Language Models	InsTALL：利用多模态大语言模型实现上下文感知的任务指导助手	large language model multimodal
13	Explainability for Vision Foundation Models: A Survey	综述：视觉基础模型的可解释性研究进展与挑战	foundation model
14	ComposeAnyone: Controllable Layout-to-Human Generation with Decoupled Multimodal Conditions	ComposeAnyone：提出解耦多模态条件的可控布局到人体生成方法。	multimodal
15	Are Traditional Deep Learning Model Approaches as Effective as a Retinal-Specific Foundation Model for Ocular and Systemic Disease Detection?	评估视网膜专用基础模型RETFound与传统深度学习模型在眼科和全身疾病检测中的有效性	foundation model
16	MMVU: Measuring Expert-Level Multi-Discipline Video Understanding	MMVU：提出专家级多学科视频理解评测基准，挑战通用模型在专业领域的知识推理能力。	foundation model multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
17	High-dimensional multimodal uncertainty estimation by manifold alignment:Application to 3D right ventricular strain computations	提出基于流形对齐的高维多模态不确定性估计方法，应用于三维右心室应变计算。	representation learning multimodal
18	InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling	InternVideo2.5通过长程和丰富上下文建模增强视频多模态大语言模型	direct preference optimization spatiotemporal large language model	✅
19	Contrastive Masked Autoencoders for Character-Level Open-Set Writer Identification	提出CMAE模型，解决字符级开放集手写者身份识别问题	representation learning masked autoencoder MAE
20	Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos	提出Memory Storyboard，利用时序分割进行第一视角视频流的自监督学习	contrastive learning egocentric
21	DNRSelect: Active Best View Selection for Deferred Neural Rendering	DNRSelect：用于延迟神经渲染的主动最佳视角选择方法	reinforcement learning NeRF geometric consistency
22	SMamba: Sparse Mamba for Event-based Object Detection	提出SMamba：一种稀疏Mamba架构，用于提升事件相机目标检测的效率与精度。	Mamba spatiotemporal
23	InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model	提出InternLM-XComposer2.5-Reward，一个简单高效的多模态奖励模型，用于提升LVLM的生成质量。	reinforcement learning PPO instruction following	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	Cinepro: Robust Training of Foundation Models for Cancer Detection in Prostate Ultrasound Cineloops	Cinepro：通过稳健训练提升前列腺超声电影环中癌症检测的基础模型性能	spatial relationship foundation model

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
25	Regressor-Guided Generative Image Editing Balances User Emotions to Reduce Time Spent Online	提出Regressor引导的生成图像编辑，平衡用户情绪以减少上网时间	classifier-free guidance

⬅️ 返回 cs.CV 首页 · 🏠 返回主页