The company said that the model was trained on 15 trillion mixed visual and text tokens.