cs.CV（2024-11-04）

📊 共 27 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (10 🔗3) 支柱三：空间感知与语义 (Perception & Semantics) (9 🔗3) 支柱二：RL算法与架构 (RL & Architecture) (5 🔗2) 支柱六：视频提取与匹配 (Video Extraction) (2 🔗1) 支柱一：机器人控制 (Robot Control) (1 🔗1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (10 篇)

#	题目	一句话要点	标签	🔗	⭐
1	ChatTracker: Enhancing Visual Tracking Performance via Chatting with Multimodal Large Language Model	ChatTracker：利用多模态大语言模型提升视觉跟踪性能	large language model multimodal
2	KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension	提出KptLLM，利用大语言模型进行关键点语义理解，解决像素级语义细节捕捉难题。	large language model multimodal chain-of-thought
3	Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models	Digi2Real：利用人脸基础模型弥合合成数据人脸识别的真实感差距	foundation model
4	A Novel Deep Learning Tractography Fiber Clustering Framework for Functionally Consistent White Matter Parcellation Using Multimodal Diffusion MRI and Functional MRI	提出Deep Multi-view Fiber Clustering (DMVFC)框架，用于功能一致的白质分割。	multimodal
5	3D Audio-Visual Segmentation	提出EchoSegnet，解决3D场景中基于声音的物体分割问题。	embodied AI foundation model	✅
6	Multi-Transmotion: Pre-trained Model for Human Motion Prediction	Multi-Transmotion：用于人体运动预测的跨模态预训练模型	multimodal	✅
7	Adaptive Length Image Tokenization via Recurrent Allocation	提出基于循环分配的自适应长度图像Token化方法，提升视觉系统表征效率。	large language model
8	AM Flow: Adapters for Temporal Processing in Action Recognition	提出AM Flow和时间处理适配器，提升图像模型在动作识别中的时序建模能力。	foundation model
9	SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities	SPECTRUM：提出一种融合语义处理和情感信息的视频字幕生成框架。	multimodal
10	Learning Where to Edit Vision Transformers	提出基于超网络的ViT编辑方法，提升模型在子群体偏移下的泛化性和局部性。	large language model	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (9 篇)

#	题目	一句话要点	标签	🔗	⭐
11	FewViewGS: Gaussian Splatting with Few View Matching and Multi-stage Training	FewViewGS：基于少量视图匹配和多阶段训练的高斯溅射，提升稀疏图像下的新视角合成效果	depth estimation 3D gaussian splatting gaussian splatting
12	GVKF: Gaussian Voxel Kernel Functions for Highly Efficient Surface Reconstruction in Open Scenes	提出高斯体素核函数，高效重建开放场景三维表面	3D gaussian splatting 3DGS gaussian splatting
13	Improving Domain Generalization in Self-supervised Monocular Depth Estimation via Stabilized Adversarial Training	提出SCAT框架，通过稳定对抗训练提升自监督单目深度估计的领域泛化性	depth estimation monocular depth
14	Exploiting Unlabeled Data with Multiple Expert Teachers for Open Vocabulary Aerial Object Detection and Its Orientation Adaptation	提出CastDet，解决开放词汇空中目标检测中弱特征和任意方向问题。	open-vocabulary open vocabulary	✅
15	PMPNet: Pixel Movement Prediction Network for Monocular Depth Estimation in Dynamic Scenes	PMPNet：动态场景下单目深度估计的像素运动预测网络	depth estimation monocular depth
16	A Probabilistic Formulation of LiDAR Mapping with Neural Radiance Fields	提出基于概率的NeRF LiDAR建图方法，解决多重反射导致的幻影表面问题	NeRF neural radiance field PULSE	✅
17	Map++: Towards User-Participatory Visual SLAM Systems with Efficient Map Expansion and Sharing	Map++：面向用户参与的视觉SLAM系统，实现高效地图扩展与共享	visual SLAM
18	Communicate Less, Synthesize the Rest: Latency-aware Intent-based Generative Semantic Multicasting with Diffusion Models	提出延迟感知的意图驱动生成语义组播框架，利用扩散模型减少通信量。	semantic map
19	Multi-task Geometric Estimation of Depth and Surface Normal from Monocular 360° Images	提出一种多任务学习网络，用于单目360°图像的深度和表面法线几何估计。	scene understanding	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Learning General-Purpose Biomedical Volume Representations using Randomized Synthesis	提出基于随机合成的通用生物医学体数据表征学习方法，提升模型泛化性。	representation learning contrastive learning foundation model
21	PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	PPLLaVA：提出提示引导的池化策略，实现短视频与长视频的统一理解。	DPO direct preference optimization large language model	✅
22	Masked Autoencoders are Parameter-Efficient Federated Continual Learners	提出pMAE：一种参数高效的联邦持续学习方法，解决灾难性遗忘和非独立同分布问题。	masked autoencoder MAE	✅
23	How Far is Video Generation from World Model: A Physical Law Perspective	通过物理定律视角评估视频生成模型的世界模型能力与泛化机制	world model
24	Rotation Perturbation Robustness in Point Cloud Analysis: A Perspective of Manifold Distillation	提出基于流形蒸馏的点云旋转扰动鲁棒性方法	distillation

🔬 支柱六：视频提取与匹配 (Video Extraction) (2 篇)

#	题目	一句话要点	标签	🔗	⭐
25	TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos	TI-PREGO：利用思维链和上下文学习进行程序性第一视角视频中的在线错误检测	egocentric large language model chain-of-thought
26	Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack	提出语义对齐对抗演化三角方法，提升视觉-语言模型对抗样本的迁移性	feature matching multimodal	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
27	Training-free Regional Prompting for Diffusion Transformers	提出一种免训练的区域提示方法，提升Diffusion Transformer在复杂文本生成中的精细控制能力。	manipulation spatial relationship large language model	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页