https://drive.google.com/file/d/12Ky8Mv-86C3ECFBHzkVq0xoPrOr-CM23/view?usp=sharing
Data Collection Data Verification Configuration
Feature Extraction ML Algorithms Analysis Tools
Service
Infrastructure
Monitoring Process Management Tools Machine Resource
Management
Evaluation Pipeline
1 - Requirements(Business Objective)
2 - Frame your ML Task
3 - Data Preparation
4 - Model Development
5 - Evaluation
6 - Deployment
7 - Monitoring
- business objective -> increase the revenue or increase the number of registrations
- features the system needs to support -> intreaction data
- data -> what are the sources, how large is the datasets, is the data labeled?
- constraints -> computing power, are you using a cloud based system, Is the model expected to improve automatically over time?
- scale of the system -> how many useres do we have?
- performance -> How fast prediction can be? whats the priority accuracy or latency?
- define your ML Objective
- specifying the systems input and output
- selecting the right ML category
- define your ML Objective
business objective ML objective
- (Youtube)increase user engagement maximize the time a user spends on watching videos
- (instagram)imporve the platform safety accurately predict if a content is harmful
- (bookmyshow)increase ticket sales maximize the number of event registrations
- specifying the systems input and output
input algorithm output
post harmful content probability
detection system
input output
user model probability
events
- selecting the right ML category
ML Categories
Supervised Unsupervised Reinforcement
Regression clustering
Classification dimensionality reduction
- binary
- multiclass
data sources -----> data engineering -> feature engineering -----> prepared features
data preparation process
data engineering -> designing and building pipelines for collection, storing, retrieving and processing data.
data sources ->
- who collected the data
- how clean the data is
- can the source be trusted
- is the data user generated or system generated
data storage ->
- the high-level understanding of how diff databases work
SQL
Relational Database
- MySQL
- PostgreSQL
NoSQL
Key/value -> Redis, DynamoDB
Column-based -> Cassandra, HBase
Graph -> Neo4J
Document -> MongoDB, CouchDB
ETL -> Extract Transform and Load
the ETL process ->
Extract Transform Load
Data Sources to target destination
Databases Database
Logs files
Files data warehouse
Data Types in ML
- structured - numerical(Discrete and continuous), categorical(ordinal, nominal)
- predefined schema
- easy to search
- relational database, data warehouse
- unstructured - audio, video, image, text
- no schema
- difficult to search
- NoSQL databases, data lakes