Word to Vectors is a better way of converting a word into a numerical representation.
There is the dataset available by Stanford → https://nlp.stanford.edu/projects/glove/
You can download the dataset and run the code below to convert any word from the English language into its vector representation.
import numpy as np
def loadGlove(path):
file = open(path, 'r', encoding='utf8')
model = {}
for l in file:
line = l.split()
word = line[0]
value = np.array([float(val) for val in line[1:]])
model[word] = value
return model
glove = loadGlove('glove.6B.50d.txt')
glove['python'] # vector embedding for the word Python
Output →
array([ 0.5897 , -0.55043 , -1.0106 , 0.41226 , 0.57348 , 0.23464 ,
-0.35773 , -1.78 , 0.10745 , 0.74913 , 0.45013 , 1.0351 ,
0.48348 , 0.47954 , 0.51908 , -0.15053 , 0.32474 , 1.0789 ,
-0.90894 , 0.42943 , -0.56388 , 0.69961 , 0.13501 , 0.16557 ,
-0.063592, 0.35435 , 0.42819 , 0.1536 , -0.47018 , -1.0935 ,
1.361 , -0.80821 , -0.674 , 1.2606 , 0.29554 , 1.0835 ,
0.2444 , -1.1877 , -0.60203 , -0.068315, 0.66256 , 0.45336 ,
-1.0178 , 0.68267 , -0.20788 , -0.73393 , 1.2597 , 0.15425 ,
-0.93256 , -0.15025 ])
glove['neural']
Output →
array([ 0.92803 , 0.29096 , 0.67837 , 1.0444 , -0.72551 , 2.1995 ,
0.88767 , -0.94782 , 0.67426 , 0.24908 , 0.95722 , 0.18122 ,
0.064263, 0.64323 , -1.6301 , 0.94972 , -0.7367 , 0.17345 ,
0.67638 , 0.10026 , -0.033782, -0.76971 , 0.40519 , -0.099516,
0.79654 , 0.1103 , -0.076053, -0.090434, 0.015021, -1.137 ,
1.6803 , -0.34424 , 0.77538 , -1.8718 , -0.17148 , 0.31956 ,
0.093062, 0.004996, 0.25716 , 0.52207 , -0.52548 , -0.93144 ,
-1.0553 , 1.4401 , 0.30807 , -0.84872 , 1.9986 , 0.10788 ,
-0.23633 , -0.17978 ])
Here comes a simple question → How does the computer know that words are similar?