cs.CV（2024-11-13）

📊 共 24 篇论文 | 🔗 6 篇有代码

🎯 兴趣领域导航

支柱九：具身大模型 (Embodied Foundation Models) (8 🔗2) 支柱三：空间感知与语义 (Perception & Semantics) (6 🔗2) 支柱二：RL算法与架构 (RL & Architecture) (6 🔗1) 支柱五：交互与反应 (Interaction & Reaction) (1) 支柱一：机器人控制 (Robot Control) (1) 支柱四：生成式动作 (Generative Motion) (1 🔗1) 支柱八：物理动画 (Physics-based Animation) (1)

🔬 支柱九：具身大模型 (Embodied Foundation Models) (8 篇)

#	题目	一句话要点	标签	🔗	⭐
1	Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models	提出MLLM4WTAL框架，利用多模态大语言模型指导弱监督时序动作定位。	large language model foundation model multimodal
2	Multimodal Object Detection using Depth and Image Data for Manufacturing Parts	提出一种基于深度和图像数据的多模态目标检测方法，用于提升制造零件识别的可靠性。	multimodal
3	Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions	提出KnowAda微调方法，提升小规模视觉语言模型在生成知识增强型图像描述时的准确性，并减少幻觉。	multimodal
4	MIRe: Enhancing Multimodal Queries Representation via Fusion-Free Modality Interaction for Multimodal Retrieval	MIRe：通过无融合模态交互增强多模态查询表示，用于多模态检索	multimodal	✅
5	Retrieval Augmented Recipe Generation	提出检索增强的大型多模态模型，解决食谱生成中的幻觉问题。	multimodal
6	LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation	提出LG-Gaze，利用几何感知连续提示学习进行语言引导的注视估计	multimodal
7	The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense	揭示VLLM安全悖论：越狱攻击与防御的双重易用性，提出LLM-Pipeline防御方法。	large language model
8	ReMP: Reusable Motion Prior for Multi-domain 3D Human Pose Estimation and Motion Inbetweening	提出可复用运动先验ReMP，用于多领域3D人体姿态估计和运动插值。	foundation model	✅

🔬 支柱三：空间感知与语义 (Perception & Semantics) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
9	MBA-SLAM: Motion Blur Aware Gaussian Splatting SLAM	提出MBA-SLAM，解决运动模糊场景下的高精度SLAM问题，提升相机定位和地图重建质量。	3D gaussian splatting 3DGS gaussian splatting	✅
10	4D Gaussian Splatting in the Wild with Uncertainty-Aware Regularization	提出不确定性感知正则化的4D高斯溅射，用于野生单目视频动态场景重建	gaussian splatting splatting scene reconstruction
11	Biomass phenotyping of oilseed rape through UAV multi-view oblique imaging with 3DGS and SAM model	结合3DGS与SAM模型，实现油菜高精度三维重建与生物量表型分析	3D gaussian splatting 3DGS gaussian splatting
12	BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis	BBSplat：基于可学习纹理图元的 novel view synthesis 方法	3D gaussian splatting 3DGS gaussian splatting
13	Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models	提出基于傅里叶谱幅度特征提取的无监督方法，提升神经渲染图像伪造检测的准确性。	3D gaussian splatting gaussian splatting splatting
14	OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic Guidance	OSMLoc：融合几何与语义引导的单图像OpenStreetMap视觉定位	depth estimation monocular depth scene understanding	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (6 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Multimodal Instruction Tuning with Hybrid State Space Models	提出混合Transformer-MAMBA模型，高效处理多模态长上下文输入。	Mamba state space model large language model
16	MambaXCTrack: Mamba-based Tracker with SSM Cross-correlation and Motion Prompt for Ultrasound Needle Tracking	提出MambaXCTrack以解决超声针头跟踪中的可见性问题	Mamba SSM state space model
17	EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation	提出EgoVid-5M大规模第一人称视频数据集，用于提升主观视角视频生成效果。	dreamer egocentric
18	Scale Contrastive Learning with Selective Attentions for Blind Image Quality Assessment	提出CSFIQA框架，利用选择性注意力与对比学习提升盲图像质量评估性能	contrastive learning
19	Dual-Head Knowledge Distillation: Enhancing Logits Utilization with an Auxiliary Head	提出双头知识蒸馏(DHKD)，解决logits信息利用不充分及分类头坍塌问题。	distillation	✅
20	A survey on Graph Deep Representation Learning for Facial Expression Recognition	综述：图深度表示学习在面部表情识别中的应用	representation learning

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
21	CoMiX: Cross-Modal Fusion with Deformable Convolutions for HSI-X Semantic Segmentation	提出CoMiX，利用可变形卷积进行跨模态融合，提升高光谱图像语义分割性能。	HSI multimodal

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
22	A Survey on Vision Autoregressive Model	综述视觉自回归模型，涵盖图像、视频生成及多模态统一生成等任务。	manipulation motion generation multimodal

🔬 支柱四：生成式动作 (Generative Motion) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
23	Motion Control for Enhanced Complex Action Video Generation	MVideo：提出一种基于掩码序列运动控制的复杂动作视频生成框架	motion generation	✅

🔬 支柱八：物理动画 (Physics-based Animation) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
24	MikuDance: Animating Character Art with Mixed Motion Dynamics	MikuDance：融合混合运动动态的角色艺术动画生成扩散模型	motion tracking

⬅️ 返回 cs.CV 首页 · 🏠 返回主页