Day 4 - Recorded session

https://youtu.be/86wflZLc4sQ

https://drive.google.com/file/d/12Ky8Mv-86C3ECFBHzkVq0xoPrOr-CM23/view?usp=sharing

ML Systems that is production ready

Data Collection              Data Verification              Configuration

Feature Extraction           ML Algorithms                  Analysis Tools
                                                                                    Service
                                                                                    Infrastructure

Monitoring                   Process Management Tools       Machine Resource
                                                            Management

                             Evaluation Pipeline

the design steps (framework)

1 - Requirements(Business Objective)
2 - Frame your ML Task
3 - Data Preparation
4 - Model Development
5 - Evaluation
6 - Deployment
7 - Monitoring

1 - Requirements

- business objective -> increase the revenue or increase the number of registrations
- features the system needs to support -> intreaction data
- data -> what are the sources, how large is the datasets, is the data labeled?
- constraints -> computing power, are you using a cloud based system, Is the model expected to improve automatically over time?
- scale of the system -> how many useres do we have?
- performance -> How fast prediction can be? whats the priority accuracy or latency?

2 - Frame your ML Task

- define your ML Objective
- specifying the systems input and output
- selecting the right ML category

- define your ML Objective

business objective                                        ML objective
- (Youtube)increase user engagement                       maximize the time a user spends on watching videos
- (instagram)imporve the platform safety                  accurately predict if a content is harmful
- (bookmyshow)increase ticket sales                       maximize the number of event registrations

- specifying the systems input and output

input          algorithm               output
post           harmful content         probability
               detection system

input                      output
user           model       probability
events

- selecting the right ML category

                                    ML Categories
Supervised                          Unsupervised                            Reinforcement
Regression                          clustering
Classification                      dimensionality reduction
    - binary
    - multiclass

3 - Data Preparation

data sources -----> data engineering -> feature engineering -----> prepared features
                           data preparation process

data engineering -> designing and building pipelines for collection, storing, retrieving and processing data.

data sources ->
- who collected the data
- how clean the data is
- can the source be trusted
- is the data user generated or system generated

data storage ->
- the high-level understanding of how diff databases work

SQL

Relational Database

- MySQL
- PostgreSQL

NoSQL

Key/value      -> Redis, DynamoDB
Column-based   -> Cassandra, HBase
Graph          -> Neo4J
Document       -> MongoDB, CouchDB

ETL -> Extract Transform and Load

the ETL process ->

Extract                           Transform                          Load
Data Sources                                                     to target destination

Databases                                                          Database

Logs                                                               files

Files                                                              data warehouse

Data Types in ML

- structured - numerical(Discrete and continuous), categorical(ordinal, nominal)
             - predefined schema
             - easy to search
             - relational database, data warehouse

- unstructured - audio, video, image, text
               - no schema
               - difficult to search
               - NoSQL databases, data lakes