This is part of the feature engineering process in an NLP pipeline.

Text Normalization

Normalization means keeping all the words on the same scale.

All the words should pass through the NLP pre-processing pipeline we discussed earlier.

v42.png

In any format, the text is, it will be normalized into, as mentioned above.

Tokenization

It means we are breaking down the whole corpus of text into smaller chunks.

There are different ways to do that →