Visual Search System

https://in.pinterest.com/pin/495466396524160580/visual-search/?x=85&y=596&w=177&h=74&cropSource=6&imageSignature=bac31a3502b70a08d53a9eae4179145f

Frame the problem as an ML Assignment

ML Objective

choose the right ML category

ranking problem
- recommendation systems
- search engine
- document retrieval

representation learning
- embeddings vector -> its a vector representation of image

|
|             x
| x           |
|    x    x   |
|  x__x_______|
|__________________

n-dimensional space

dog image ------->           [0.1, 0.8, -1, 0.6, 0]
std = 1, mean = 0

How to rank images using representation learning?

- embedding vector
- similarity scores

Data Preparation

Data Engineering

- Images
- Users
- User-image interations

- images

id, userid, upload time, image tags(labels)
                         (lion,animal)
                         (pasta, food, kitchen)
                         (child, family, party)

- users

id, username, age, gender, city, country, email

- user-image interations

userid, query image id, displayed image id, position in the results list, interation type, location, timestamp

feature engineering

operations of image preprocessing

- resizing - (224 x 224)
- scaling - range of 0 and 1
- consistent color mode -> RGB or CMYK(cyan, magenta, yellow and black(k))

Model development

model selection
- neural networks are best for unstructured data

what type of neural network architectures should we use?
- CNN based architectures -> ResNet
- transformer based - ViT

model training
- the model must learn representations(embeddings)

constructive training

dog  ---> snake, dog, house, lion

construct dataset

- use human judgment
- use intraction data such as user clicks -> there can be noisy data
- artificially create a similar image from the query image (data agumentation)
self-supervision - SimCLR

choosing a loss function

- compute similarities -> query images and other images -> dot product, cosine similarity,
                       - euclidean distance -> it will perform poor on high dimentional data
                       - curse of dimensionality

- softmax -> compute distance -> 0 to 1

- cross entropy - how close the predicted probabilities between negative and positive images

Evaluation

offline metrics ->
- precision
- recall
- mean average precision

online metrics ->
- click through rate(CTR) - how often users click on the displayed items

# do we use the vector representation on geo data?

create a in-depth architecture based on the above information

Connect with me on LinkedIn: Himanshu Ramchandani