YouTube
We have to design a search system for videos.
- input is text query
- output is a list of videos that are relevant to the text query
to solve this problem we can divide this into 2 parts
- visual content
- textual content
ML Objective -> rank videos based on their relevance to the text query
"learn python as a beginner"
Text Query - text search
- visual search
- text search -> text encoder -> text embedding [0.2,0.6,-0.9,-0.3] # NLP
- visual search -> video encoder -> video embedding [0.1,0.8,-1,-0.7]
dot product -> text and each video in the embedding space
get the rank of the videos based on their similarity score
# Data Engineering
DE is not needed in this case
- dataset is already annotated
video file query split
1234.mp4 celebration of worldcup winning trainig
- Text Normalization
Lowercasing, punctuation removal(?,!), trim whitespaces, strip accents, lemmatization and stemming
- Tokenization
word tokenization, subword tokenization (n-gram characters), character tokenization(set of characters)
- Tokens to IDs
words to numerical values
Lookup table-> animal - 18, car - 128,
Hashing(feature hashing) - animal - 4, car - 1
l = [10,20]
l[0]
{'indore': 10}
video -> frames -> resize -> scale, normalize, color changes -> frames as numerical values
text encoder