How to Deploy a Machine Learning Web App From Scratch
All passionate machine learning developers enjoy a lot create, train and find out the best fitted models for their use cases. But the last remaining question is how to put these models in production and make them ready to be consumed?
In real world industry the majority of Artificial Intelligence use cases are simply limited in the POC phase, which is very frustrating :( !!
In this post, we will go through the entire life-cycle of a machine learning model process: starting from data retrieving to model serving.
We will use the IBM Watson Marketing Customer Value Data gathered from Watson Analytics, to create a dash web app simulator allowing to change feature values and getting the updated score at each simulation. The app is deployed here on heroku platform. I will show you how I got that web app served, but also give you some tricks to avoid wasting a loooot of time to solve some issues that can occur while deploying a dash app.
The related github repo is here:
Step 0: I created a new repo as follows, then I cloned it:
The picture above is a global structure at the the end of the project. However, you might make the same directories structures at the very beginning.
Step1 : Retrieve data:
I simply downloaded it from kaggle, you can also use the kaggle-API.
Then load the data:
import pandas as pd
df = (pd.read_csv('../data/Customer-Value-Analysis.csv')
.set_index('Customer')
)
Step2 : data Pre-porcessing :
In this step we can construct our own pipeline so that it can be re-used later on new incoming data.
First separate the target, numerical and categorical features:
Custom Pipeline:
With many data transformation steps it is recommended to use Pipeline class provided by Scikit-learn that helps to make sequenced transformations in the right order. It can be done using the FeatureUnion estimator offered by scikit-learn. This estimator applies a list of transformer objects in parallel to the input data, then concatenates the results.
In this project, I implemented the following code (source from this kaggle notebook)
Apply data transformation :
fit_tranfsorm()
within the train datasettransform()
on the test subset
Step 3: Training phase:
Open a new jupyter inside the /src
folder
- Model selection: It consists to test out different types of algorithms and evaluate the performances using both the cross validation and train/test evaluation technique, with log loss metric. We had tested KNN, XGBoost and Random Forest classifiers
We display the obtained performances
=> We notice that Random seems to perform well, so we ll fine tune it
2. Model Fine-Tuning:
The most common way is GridSearchCV evaluate all the possible combinations of hyper-parameter values using cross-validation.
For example, the following code looks for the best combination of hyper-parameter values for the RandomForestClassifier:
PS : In the official doc the scoring
parameter takes as value the string name of your metric (‘accuracy’, ‘roc_auc’.. : the exhaustive metrics list that are supported by the scoring
parameter can be found here). For the metrics that are not defined among the list (as it is our case unfortunately) use sklear.metrics.make_scorer
function.
The best combination is stored in best_estimator_
attribute:
sk_best = grid_search.best_estimator_
3. Model persistence:
Before we get into the Dash app developing we have to ‘persist’ the trained objects that will be used later by the app, within the /model directory.
- The one hot encoding categories (we will use it later to display feature importance):
- The
sk_best
: the best fine tuned model found by the gridsearch method - The best model performances: Now we will save the model performances corresponding to different metrics such as precision_score, recall_score, accuracy_score and f1_score (we will use it later to display a pretty bar chart in our dash app)
Now that we had trained and persisted our model and other useful elements, we can start developing the dash app ;) !
Create your dash app
The dash code can be found here
Now that our models are trained, we will take care of implementing our dash app. We can schematize the web app as follows:
Our Dash uses Flask server under the hood, and will be deployed on heroku platform that supports Flask based apps. Once deployed the server would interact with our sklearn pre-trained model to update the prediction.
Now we have to design and implement the manner of how the dynamic elements will be displayed on our web browser.
For simplicity, we opted to create the app as follows (it can be improved):
We create plotly.graph_objects
figures corresponding to every element shown above (except the prediction score field, we will see it later):
This requires some explanations:
- To get the the feature importance corresponding to our model, we used the
feature_importances_
attribute of thesk_best
model (found after fine tuning the random forest model in the previous step). - We retrieve the previously saved performances on
perfs
dictionary variable that will be displayed as an horizontal bar chat - For every numerical feature we create its corresponding Slider element
- For each numerical feature we constructed a dropdown elementt
Now all we have to do is to is to combine the created HTML elements within the app principle layout:
The className
elements refers to CSS classes that are defined in this gist.
The CSS link must be settled in external_stylesheets
parameter so that the dash app knows that it will include the css from an external resource (I borrowed the idea from here).
Please note that this part has taken me a looot of time that I want you to save. As shown above, Dash allows you to import external css resources. The question is how to create your own css resources and make them exploitable. Well, one solution consists in saving it on a gist within your github profile. However when you would request your gist, the response is usually served with a Content-Type=text/plain
. As a result, your browser won't actually interpret it as a CSS type.
The solution consists on passing the related url to the raw.githack.com proxy allowing to relay the app requests with the correct Content-Type
header.
I hosted the css gist within /assets/style.css inside my github root project then I changed it via the raw.github proxy:
I retrieved the production URL and added it into theexternal_stylesheets
dash app parameter.
Let’s get back to our dash app: In the previous code snippet, you can notice that I added a new html division with id=prediction_result
; it will be used to identify the related HTML element (our score text) to update it in a dynamic manner. So we need to make the different HTML components interact with each other.
Dash makes it possible via the app.callback functions: We will implement a callback function that changes dynamically the HTML element identified by prediction_result
every time the value of the other elements change, without having to reloading the page:
To test out your code, add a run.py
script inside your root project folder and add :
from src.app_dash import server
With dash_app.py
the dash app script you can find it here
Then you can either use the Flask server (packaged by the Dash app) or gunicorn (which is recommended)
With gunicorn :
$gunicorn run:server
Deploy on heroku:
To succeed your deployment on Heroku you have to provide within your root project repos two special files:
- Procfile: It gives the instructions to execute when starting the application.
web: gunicorn run:server
- requirements.txt: lists the libraries to install: To get all the required packages for your project you can use this command:
pip freeze > requirements.txt
Before using heroku you have to commit all changes to your github repo
- Install:
First of all you have to install heroku : according to your OS install it as described here: https://devcenter.heroku.com/articles/heroku-cli
2. Create a new heroku app:
Example:
$heroku create ibm-customer-churn-simulator
3. Connect your app to your Github repos:
Now that the new app is created all we have to do is to connect it to our github repo, click on it, go to the Deploy tab and choose Github in the Deployment method section :
Then select your repository and the corresponding branch.
Finally, click on Deploy Branch Button
Now starts the stressful moment :///…… Wait for deployment success!!!
If everything goes well, you get this green message:
And finally you get your app :
Here the link to the app, hosted on heroku.
Conclusion:
Thanks for consulting my post ! If you want to get deeper, I would reeeaaly recommend you this excellent post ❤ for advanced deployment (from how to SCRAPE data to how to deploy a deep learning type model with docker passing through how to develop a REST API)
Please note that I implemented the exact same app, but by using the solution provided by our company Prevision.io, I wrote a related post here. Since I have the chance to access to both our SaaS solution (for auto-ml processes) and our PaaS Service (the Store where apps are deployed) I tested our proprietary solution. However in this post I used only open source solutions. If you consult my other post, you would notice that the difficulty level is really not the same!
Do not hesitate to ask me If you have questions/issues.
My email address : zghrib@outlook.fr
My linkedin : here