https://in.pinterest.com/pin/495466396524160580/visual-search/?x=85&y=596&w=177&h=74&cropSource=6&imageSignature=bac31a3502b70a08d53a9eae4179145f
Frame the problem as an ML Assignment
accurately retrieve images that are visually similar to the image the user is searching for
query image ----------> visual search system ----------> multiple similar images
input output(results)
choose the right ML category
ranking problem
- recommendation systems
- search engine
- document retrieval
representation learning
- embeddings vector -> its a vector representation of image
|
| x
| x |
| x x |
| x__x_______|
|__________________
n-dimensional space
dog image -------> [0.1, 0.8, -1, 0.6, 0]
std = 1, mean = 0
How to rank images using representation learning?
- embedding vector
- similarity scores
Data Engineering
- Images
- Users
- User-image interations
- images
id, userid, upload time, image tags(labels)
(lion,animal)
(pasta, food, kitchen)
(child, family, party)
- users
id, username, age, gender, city, country, email
- user-image interations
userid, query image id, displayed image id, position in the results list, interation type, location, timestamp
feature engineering
operations of image preprocessing
- resizing - (224 x 224)
- scaling - range of 0 and 1
- consistent color mode -> RGB or CMYK(cyan, magenta, yellow and black(k))
model selection
- neural networks are best for unstructured data
what type of neural network architectures should we use?
- CNN based architectures -> ResNet
- transformer based - ViT
model training
- the model must learn representations(embeddings)
constructive training
dog ---> snake, dog, house, lion
construct dataset
- use human judgment
- use intraction data such as user clicks -> there can be noisy data
- artificially create a similar image from the query image (data agumentation)
self-supervision - SimCLR
choosing a loss function
- compute similarities -> query images and other images -> dot product, cosine similarity,
- euclidean distance -> it will perform poor on high dimentional data
- curse of dimensionality
- softmax -> compute distance -> 0 to 1
- cross entropy - how close the predicted probabilities between negative and positive images
Evaluation
offline metrics ->
- precision
- recall
- mean average precision
online metrics ->
- click through rate(CTR) - how often users click on the displayed items
# do we use the vector representation on geo data?
Connect with me on LinkedIn: Himanshu Ramchandani