Designing Video Search System

YouTube

1 - Requirements

We have to design a search system for videos.

- input is text query
- output is a list of videos that are relevant to the text query

to solve this problem we can divide this into 2 parts

- visual content
- textual content

2 - ML Problem

ML Objective -> rank videos based on their relevance to the text query

"learn python as a beginner"
        Text Query              - text search
                                - visual search

- text search -> text encoder -> text embedding [0.2,0.6,-0.9,-0.3]  # NLP
- visual search -> video encoder -> video embedding [0.1,0.8,-1,-0.7]

dot product -> text and each video in the embedding space
get the rank of the videos based on their similarity score

3 - Data Preperation

# Data Engineering

DE is not needed in this case
- dataset is already annotated

video file   query                               split
1234.mp4    celebration of worldcup winning      trainig

Feature Engineering

preparation of text data

- Text Normalization
Lowercasing, punctuation removal(?,!), trim whitespaces, strip accents, lemmatization and stemming

- Tokenization
word tokenization, subword tokenization (n-gram characters), character tokenization(set of characters)

- Tokens to IDs
words to numerical values

Lookup table->   animal - 18, car - 128,
Hashing(feature hashing) - animal - 4, car - 1

l = [10,20]

l[0]

{'indore': 10}

Text Normalization
Tokenization
Video Data

video -> frames -> resize -> scale, normalize, color changes -> frames as numerical values

4 - Model Development

text encoder