cs.CV（2024-08-29）

📊 共 25 篇论文 | 🔗 8 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (8 🔗2) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (3) 支柱一：机器人控制 (Robot Control) (2) 支柱六：视频提取与匹配 (Video Extraction) (2 🔗1) 支柱七：动作重定向 (Motion Retargeting) (1 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	OmniRe: Omni Urban Scene Reconstruction	OmniRe：构建高保真动态城市场景数字孪生，支持全动态前景重建	3DGS gaussian splatting splatting
2	ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model	ReconX：利用视频扩散模型从稀疏视图重建任意场景	3D gaussian splatting gaussian splatting splatting
3	NeRF-CA: Dynamic Reconstruction of X-ray Coronary Angiography with Extremely Sparse-views	NeRF-CA：提出一种基于极稀疏视角X射线冠状动脉造影的动态重建方法	NeRF neural radiance field	✅
4	EvLight++: Low-Light Video Enhancement with an Event Camera: A Large-Scale Real-World Dataset, Novel Method, and More	EvLight++：提出一种事件相机引导的低光视频增强方法，并构建大规模真实数据集。	depth estimation monocular depth foundation model
5	Creating a Segmented Pointcloud of Grapevines by Combining Multiple Viewpoints Through Visual Odometry	结合视觉里程计多视角融合，构建葡萄藤分割点云用于冬季修剪	visual odometry
6	Generic Objects as Pose Probes for Few-shot View Synthesis	提出PoseProbe，利用常见物体作为位姿探针，解决少视角NeRF重建问题。	NeRF scene reconstruction	✅
7	Beyond Uncertainty: Evidential Deep Learning for Robust Video Temporal Grounding	提出基于证据深度学习的SRAM模块，提升视频时序定位在开放环境下的鲁棒性。	open-vocabulary open vocabulary
8	Spurfies: Sparse Surface Reconstruction using Local Geometry Priors	Spurfies：利用局部几何先验的稀疏表面重建方法	NeRF

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
9	Toward Robust Early Detection of Alzheimer's Disease via an Integrated Multimodal Learning Approach	提出基于多模态融合的MSTNet模型，用于阿尔茨海默病早期诊断。	multimodal	✅
10	Bootstrap Segmentation Foundation Model under Distribution Shift via Object-Centric Learning	提出SlotSAM，通过对象中心学习提升分割基础模型在分布偏移下的泛化能力	foundation model
11	Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models	利用多模态大语言模型，重构稀疏词汇表示用于图像检索	large language model
12	GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models	提出GradBias框架，揭示文本到图像生成模型中词语对偏见的影响。	large language model foundation model	✅
13	Law of Vision Representation in MLLMs	提出多模态大语言模型(MLLM)的视觉表征定律，通过AC score优化视觉表征。	large language model multimodal
14	CogVLM2: Visual Language Models for Image and Video Understanding	CogVLM2：用于图像和视频理解的视觉语言模型，支持高分辨率和时序建模。	TAMP	✅
15	Exploiting temporal information to detect conversational groups in videos and predict the next speaker	利用时序信息检测视频会话群体并预测下一位发言者	multimodal

🔬 支柱二：RL算法与架构 (RL & Architecture) (3 篇)

#	题目	一句话要点	标签	🔗	⭐
16	COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation	COIN：用于人和相机运动估计的可控Inpainting扩散先验	distillation motion diffusion model motion diffusion
17	VLM-KD: Knowledge Distillation from VLM for Long-Tail Visual Recognition	提出VLM-KD，利用视觉语言模型蒸馏知识，提升长尾视觉识别性能	distillation
18	UDD: Dataset Distillation via Mining Underutilized Regions	UDD：通过挖掘欠利用区域实现数据集蒸馏，提升合成数据利用率。	distillation

🔬 支柱一：机器人控制 (Robot Control) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
19	FastForensics: Efficient Two-Stream Design for Real-Time Image Manipulation Detection	提出FastForensics，一种高效双流架构用于实时图像篡改检测	manipulation
20	Enabling Local Editing in Diffusion Models by Joint and Individual Component Analysis	提出基于联合与个体成分分析的扩散模型局部编辑方法	manipulation

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
21	VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation	VideoLLM-MoD：混合深度视觉计算的高效视频语言流处理	Ego4D
22	3D Pose-Based Temporal Action Segmentation for Figure Skating: A Fine-Grained and Jump Procedure-Aware Annotation Approach	提出基于3D姿态的花样滑冰动作分割与细粒度标注方法，解决缺乏3D姿态数据集和跳跃过程学习的问题。	markerless motion capture	✅

🔬 支柱七：动作重定向 (Motion Retargeting) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding	ResVG：增强关系和语义理解，解决视觉定位中多实例干扰问题	spatial relationship visual grounding	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	FineFACE: Fair Facial Attribute Classification Leveraging Fine-grained Features	FineFACE：利用细粒度特征实现公平的人脸属性分类	mutual attention	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
25	Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation	提出一种免训练的姿态引导视频生成增强策略，解决动画生成中外观一致性问题	character animation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页