cs.CV(2024-10-30)

📊 共 23 篇论文 | 🔗 5 篇有代码

🎯 兴趣领域导航

支柱三:空间感知与语义 (Perception & Semantics) (7 🔗2) 支柱九:具身大模型 (Embodied Foundation Models) (5) 支柱二:RL算法与架构 (RL & Architecture) (4 🔗1) 支柱八:物理动画 (Physics-based Animation) (3 🔗1) 支柱六:视频提取与匹配 (Video Extraction) (2 🔗1) 支柱一:机器人控制 (Robot Control) (1) 支柱七:动作重定向 (Motion Retargeting) (1)

🔬 支柱三:空间感知与语义 (Perception & Semantics) (7 篇)

#题目一句话要点标签🔗
1 Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis 提出eFreeSplat以解决传统3D高斯分割依赖于极线约束的问题 3D gaussian splatting 3DGS gaussian splatting
2 ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting ELMGS:通过压缩增强3D高斯 Splatting 的内存和计算可扩展性 3D gaussian splatting gaussian splatting splatting
3 LGU-SLAM: Learnable Gaussian Uncertainty Matching with Deformable Correlation Sampling for Deep Visual SLAM LGU-SLAM:基于可学习高斯不确定性匹配与可变形相关性采样的深度视觉SLAM visual odometry visual SLAM
4 Bringing NeRFs to the Latent Space: Inverse Graphics Autoencoder 提出逆图形自编码器,实现NeRF在隐空间的高效训练与高质量渲染。 NeRF
5 Symbolic Graph Inference for Compound Scene Understanding 提出基于符号图推理的复合场景理解方法,提升场景理解能力 scene understanding
6 Geometry Cloak: Preventing TGS-based 3D Reconstruction from Copyrighted Images 提出Geometry Cloak,防止基于TGS的版权图像3D重建 gaussian splatting splatting
7 SCRREAM : SCan, Register, REnder And Map:A Framework for Annotating Accurate and Dense 3D Indoor Scenes with a Benchmark SCRREAM:提出室内场景稠密3D重建标注框架与基准数据集,提升几何任务精度。 6D pose estimation

🔬 支柱九:具身大模型 (Embodied Foundation Models) (5 篇)

#题目一句话要点标签🔗
8 TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models 提出TOMATO基准,用于评估多模态模型在视频理解中的视觉时序推理能力 foundation model multimodal
9 CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation CrossEarth:面向领域泛化遥感语义分割的地理空间视觉基础模型 foundation model
10 PIP-MM: Pre-Integrating Prompt Information into Visual Encoding via Existing MLLM Structures PIP-MM:通过预集成提示信息到视觉编码中,提升多模态大语言模型性能 large language model multimodal
11 PV-VTT: A Privacy-Centric Dataset for Mission-Specific Anomaly Detection and Natural Language Interpretation 提出PV-VTT数据集,用于隐私侵犯异常检测和自然语言理解任务。 large language model multimodal
12 CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP 提出CLIPErase,高效实现CLIP模型中视觉-文本关联的不可学习。 multimodal

🔬 支柱二:RL算法与架构 (RL & Architecture) (4 篇)

#题目一句话要点标签🔗
13 EchoFM: Foundation Model for Generalizable Echocardiogram Analysis 提出EchoFM,用于可泛化的超声心动图分析的基础模型 contrastive learning foundation model
14 LoFLAT: Local Feature Matching using Focused Linear Attention Transformer 提出LoFLAT:利用聚焦线性注意力Transformer进行局部特征匹配 linear attention feature matching
15 Adaptive Multi Scale Document Binarisation Using Vision Mamba 提出基于Vision Mamba的自适应多尺度文档二值化方法,提升历史文档图像的可读性。 Mamba
16 AdaptiveISP: Learning an Adaptive Image Signal Processor for Object Detection 提出AdaptiveISP,一种任务驱动、场景自适应的图像信号处理器,提升目标检测性能。 reinforcement learning deep reinforcement learning

🔬 支柱八:物理动画 (Physics-based Animation) (3 篇)

#题目一句话要点标签🔗
17 First Place Solution to the ECCV 2024 ROAD++ Challenge @ ROAD++ Spatiotemporal Agent Detection 2024 针对ROAD++时空Agent检测挑战赛,提出多分支双流模型,显著提升小目标和低光照场景下的检测性能。 spatiotemporal
18 bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction 提出bit2bit以解决稀疏二进制量子图像重建问题 spatiotemporal
19 Fourier Amplitude and Correlation Loss: Beyond Using L2 Loss for Skillful Precipitation Nowcasting 提出FACL损失函数,提升降水临近预报的感知质量和气象技能评分 spatiotemporal

🔬 支柱六:视频提取与匹配 (Video Extraction) (2 篇)

#题目一句话要点标签🔗
20 ETO:Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses ETO:通过组织多重单应性假设实现高效的Transformer局部特征匹配 feature matching
21 PointRecon: Online Point-based 3D Reconstruction via Ray-based 2D-3D Matching 提出基于射线的2D-3D匹配在线点云重建方法,解决单目RGB视频的实时三维重建问题。 feature matching

🔬 支柱一:机器人控制 (Robot Control) (1 篇)

#题目一句话要点标签🔗
22 EMMA: End-to-End Multimodal Model for Autonomous Driving EMMA:用于自动驾驶的端到端多模态模型,实现规划、感知和道路图构建的统一。 motion planning large language model multimodal

🔬 支柱七:动作重定向 (Motion Retargeting) (1 篇)

#题目一句话要点标签🔗
23 Automated Image-Based Identification and Consistent Classification of Fire Patterns with Quantitative Shape Analysis and Spatial Location Identification 提出一种基于图像的火灾模式自动识别与分类框架,提升火灾调查的客观性和准确性。 spatial relationship

⬅️ 返回 cs.CV 首页 · 🏠 返回主页