Tokens were considered individual words or 3 to 4 characters, but it’s false.
Tokens can be individual or partial words, as seen in the above image.
Large Language Models use tokens to measure 3 things →
OpenAI tokenizer - Himanshu Ramchandani
The tokens will be converted into numeric embeddings, as all types of models process numbers only.
The GPT was trained on more than 500 billion tokens.
The GPT was trained on 175 billion parameters.
Both the statements are true.