Machine Learning Workflow

The (Google Cloud) Machine Learning Workflow

Rosario0g3nio
3 min readOct 6, 2022

In a traditional computational algorithm, the program we build executes it’s tasks according to specific and exclusive instructions it was build to follow, being it a major limitation for traditional computer program.

With Machine Learning (ML) algorithms, we follow a different approach. We build the ML models so they can learn for themselves and find solutions for problems without being explicitly programmed to do so.

For the learning process, we “inject” to the ML model a huge amount of data so it can learn from it and after learning, be able to make smart decisions on itself.

And after finding the right data for the problem we want to solve, we pick/develop a ML model that best suits or solves the learning problem, and for that end, Google Cloud offers options like BigQuery ML, Auto ML, Pre-built APIs and Custom Training.

An end-to-end ML workflow is as follow:

Prerequisites

1. Lots of Storage: We need a huge amount of storage to store the data, ex: Cloud storage.

2. Computing power: The whole learning process and all of its iterations demand a considerable amount of computing power, that’s why we need Cloud Computing for that end.

The main stages of the learning process follow the following structure:

1.Data Preparation

Steps of data preparation

1.1. Data Uploading: The type of data can be tabular, text, images, videos… and the source can be local storage, BigQuery or Could Storage)

1.2. Feature engineering

Here we extract features/properties from the data to better organize and understand the data so we can have a better performance of the model during the training stage. We do this using data engineering techniques to pre-process the data (clean and organize the data) for future use.

The other concern to take into account is the data type. The data types can be:

1. streaming vs. batch data

2. structured (Tabular data. ex: numbers and text) vs. unstructured data (not tabular data. ex: images, audio files…).

The data type will define how we handle it during the feature engineering stage as well to decide the amount of computing power and time required for the whole learning phase.

2. Model Training

This is composed by a “Train data-evaluate data” cycle to train the data.

We start the training process, then we evaluate, we train again, then evaluate to a point we have the best possible result.

In the training phase we can use a Supervised Learning (ex: Classification, Regression) or an Unsupervised Learning algorithm (Clustering, Association, Dimension Reduction) depending on the type of problem we want to solve.

Fortunately, with Google Cloud, when using options like AutoML or Pre-built APIs, we don’t have the need to specify the ML model you what to use to solve a give task, all we have to do is just to define the objective (ex: image classification or text translation) and google will select the best model that meets our business needs.

With BigQuery ML and Custom Training on the other hand, we’ll need to specify which model we want use to train our data.

Aftertraining the model, we do model evaluation. The model evaluation is the process of using different metrics to understand a machine learning models performance, as well as its strengths and weaknesses.

3. Model “Serving”

This is when the model is deployed, monitored and managed.

In almost all the occasions, when building a ML model, after deploying or even in the training stage we might have to go back to the first stage, the data preparation phase for example, in order to generate more useful features for our model so we can get a better performance or improve the level of predictive accuracy of our model.

--

--

Rosario0g3nio
Rosario0g3nio

Written by Rosario0g3nio

Just exploring the world of ML and Deep Learning and sharing my journey! Might also write about startups, SaaS and SE in general.

No responses yet