cs.CV（2024-12-27）

📊 共 18 篇论文 | 🔗 10 篇有代码

🎯 兴趣领域导航

支柱三：空间感知与语义 (Perception & Semantics) (7 🔗5) 支柱九：具身大模型 (Embodied Foundation Models) (7 🔗2) 支柱五：交互与反应 (Interaction & Reaction) (1 🔗1) 支柱一：机器人控制 (Robot Control) (1 🔗1) 支柱二：RL算法与架构 (RL & Architecture) (1) 支柱六：视频提取与匹配 (Video Extraction) (1 🔗1)

🔬 支柱三：空间感知与语义 (Perception & Semantics) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
1	MLLM-SUL: Multimodal Large Language Model for Semantic Scene Understanding and Localization in Traffic Scenarios	提出MLLM-SUL框架，利用多模态大语言模型解决交通场景下的语义场景理解与风险定位问题。	scene understanding large language model multimodal	✅
2	DAS3R: Dynamics-Aware Gaussian Splatting for Static Scene Reconstruction	DAS3R：提出动力学感知高斯溅射方法，用于静态场景重建	gaussian splatting splatting scene reconstruction	✅
3	Dust to Tower: Coarse-to-Fine Photo-Realistic Scene Reconstruction from Sparse Uncalibrated Images	提出Dust to Tower以解决稀疏无标定图像的场景重建问题	3D gaussian splatting 3DGS gaussian splatting
4	Learning Radiance Fields from a Single Snapshot Compressive Image	提出SCINeRF和SCISplat，从单快照压缩图像中学习辐射场，实现高质量三维重建和快速渲染。	3D gaussian splatting 3DGS gaussian splatting
5	Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation	提出GSNet框架与LandDiscover50K数据集，实现遥感图像开放词汇语义分割	open-vocabulary open vocabulary	✅
6	Sharpening Neural Implicit Functions with Frequency Consolidation Priors	提出频率整合先验以提升神经隐式函数的表现	implicit representation	✅
7	Generalized Uncertainty-Based Evidential Fusion with Hybrid Multi-Head Attention for Weak-Supervised Temporal Action Localization	提出基于广义不确定性的证据融合与混合多头注意力机制，解决弱监督时序动作定位中的动作-背景混淆问题。	optical flow	✅

🔬 支柱九：具身大模型 (Embodied Foundation Models) (7 篇)

#	题目	一句话要点	标签	🔗	⭐
8	CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs	CAD-GPT：利用空间推理增强的多模态LLM合成CAD构建序列	large language model multimodal
9	Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models	分析视觉基础模型视角不稳定性，揭示其在3D推理任务中的泛化差距	foundation model
10	A Large-scale Interpretable Multi-modality Benchmark for Facial Image Forgery Localization	提出MMTT数据集和ForgeryTalker模型，用于可解释的面部伪造图像定位。	large language model multimodal
11	From Elements to Design: A Layered Approach for Automatic Graphic Design Composition	提出LaDeCo以解决自动图形设计组合问题	multimodal
12	MBQ: Modality-Balanced Quantization for Large Vision-Language Models	提出模态平衡量化(MBQ)方法，提升大视觉-语言模型量化后的精度。	large language model	✅
13	Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints	提出时序上下文一致性方法，增强长时行为预测能力	large language model
14	MINIMA: Modality Invariant Image Matching	MINIMA：提出模态不变图像匹配框架，解决跨模态图像匹配泛化性问题	multimodal	✅

🔬 支柱五：交互与反应 (Interaction & Reaction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
15	Interacted Object Grounding in Spatio-Temporal Human-Object Interactions	提出GIO基准和4D-QA框架，解决时空人-物交互中开放世界物体定位难题	human-object interaction HOI	✅

🔬 支柱一：机器人控制 (Robot Control) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
16	MVTamperBench: Evaluating Robustness of Vision-Language Models	MVTamperBench：评估视觉-语言模型对抗视频篡改的鲁棒性	manipulation large language model multimodal	✅

🔬 支柱二：RL算法与架构 (RL & Architecture) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
17	Image Classification with Deep Reinforcement Active Learning	提出基于深度强化学习的主动学习方法以解决图像分类中的标注稀缺问题	reinforcement learning deep reinforcement learning

🔬 支柱六：视频提取与匹配 (Video Extraction) (1 篇)

#	题目	一句话要点	标签	🔗	⭐
18	Optimizing Local-Global Dependencies for Accurate 3D Human Pose Estimation	提出SSR-STF双流模型，优化局部-全局依赖关系，提升3D人体姿态估计精度	human mesh recovery	✅

⬅️ 返回 cs.CV 首页 · 🏠 返回主页