I-JEPA

Image-Joint Embedding Predictive Architecture

Image-Joint Embedding Predictive Architecture (I-JEPA) - A framework that trains a pair of Vision Transformers (ViTs) to learn a joint embedding by learning to predict values for masked out parts of an input tile.

References:

  1. Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas: “Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture”, 2023; arXiv:2301.08243.

Last updated