cs.CV（2024-11-05）

📊 共 20 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱二：RL算法与架构 (RL & Architecture) (7 🔗3) 支柱九：具身大模型 (Embodied Foundation Models) (6 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (5) 支柱六：视频提取与匹配 (Video Extraction) (1) 支柱一：机器人控制 (Robot Control) (1)

🔬 支柱二：RL算法与架构 (RL & Architecture) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation	提出IISAN-Versa框架，高效适配多模态基础模型于序列推荐，实现SOTA性能。	representation learning large language model foundation model	✅
2	Object and Contact Point Tracking in Demonstrations Using 3D Gaussian Splatting	利用3D高斯溅射进行交互示教中物体与接触点跟踪	imitation learning 3D gaussian splatting gaussian splatting
3	V-DPO: Mitigating Hallucination in Large Vision Language Models via Vision-Guided Direct Preference Optimization	提出V-DPO，通过视觉引导的直接偏好优化缓解大型视觉语言模型中的幻觉问题	preference learning DPO direct preference optimization	✅
4	Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data	提出双稀疏自中心视频的自-身体姿态估计方法，提升运动捕捉精度。	masked autoencoder egocentric spatiotemporal
5	LiVOS: Light Video Object Segmentation with Gated Linear Matching	LiVOS：利用门控线性匹配实现轻量级视频目标分割	linear attention spatiotemporal foundation model
6	Pre-trained Visual Dynamics Representations for Efficient Policy Learning	提出PVDR，利用预训练视觉动力学表征提升强化学习策略学习效率	reinforcement learning policy learning
7	ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal	ShadowMamba：基于边界区域选择性扫描的状态空间模型，用于阴影去除	Mamba	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
8	Personalized Video Summarization by Multimodal Video Understanding	提出基于多模态视频理解的个性化视频摘要方法，提升用户体验。	multimodal
9	MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning	MME-Finance：面向金融领域专家级理解与推理的多模态金融基准	multimodal
10	Fine-Grained Spatial and Verbal Losses for 3D Visual Grounding	提出AsphaltNet，通过细粒度空间和语言损失提升3D视觉定位性能	visual grounding
11	FlexCAD: Unified and Versatile Controllable CAD Generation with Fine-tuned Large Language Models	提出FlexCAD以解决可控CAD生成效率低下的问题	large language model	✅
12	Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters	视觉语言模型推理优化：减少视觉tokens，增大模型参数更有效	large language model	✅
13	CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection	CRT-Fusion：融合相机、雷达和时序信息的3D目标检测方法	TAMP

🔬 支柱三：空间感知与语义 (Perception & Semantics) (5 篇)

#	题目	一句话要点	标签	🔗	⭐
14	Multi-modal NeRF Self-Supervision for LiDAR Semantic Segmentation	提出多模态NeRF自监督框架，提升LiDAR语义分割在自动驾驶场景下的性能。	NeRF foundation model
15	CAD-NeRF: Learning NeRFs from Uncalibrated Few-view Images by CAD Model Retrieval	CAD-NeRF：利用CAD模型检索，从无标定少视图图像中学习NeRF	NeRF neural radiance field
16	Exploring Seasonal Variability in the Context of Neural Radiance Fields for 3D Reconstruction on Satellite Imagery	提出Planet-NeRF，通过月度嵌入向量增强卫星图像NeRF的季节性预测能力	NeRF neural radiance field
17	Correlation of Object Detection Performance with Visual Saliency and Depth Estimation	研究对象检测性能与视觉显著性和深度估计的相关性，为优化模型架构提供指导。	depth estimation Depth Anything
18	HFGaussian: Learning Generalizable Gaussian Human with Integrated Human Features	HFGaussian：提出融合人体特征的可泛化高斯人体建模方法	3D gaussian splatting gaussian splatting splatting

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
19	Self Supervised Networks for Learning Latent Space Representations of Human Body Scans and Motions	提出自监督网络VariShaPE和MoGeN，用于学习人体扫描和运动的潜在空间表示。	SMPL

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
20	Lost in Context: The Influence of Context on Feature Attribution Methods for Object Recognition	研究上下文对目标识别模型特征归因方法的影响	manipulation

⬅️ 返回 cs.CV 首页 · 🏠 返回主页