Prerequisites
Setup an Azure Git repo by cloning repo from https://github.com/SaschaDittmann/MLOps-Lab.git
All prerequisite steps, such as Install Python 3.6 step, adding Azure CLI ML extension, creating a compute cluster and uploading the model data are required to be done as explained in post “Setup MLOPS workspace using Azure DevOps pipeline”. Then following the same post create Azure ML workspace. The “Setup MLOPS workspace using Azure DevOps pipeline” explains how to upload data in your cloned repos data folder to the workspace as well. Uploading of the data is required to train the model with that data.
But before uploading data you should ideally run some tests. In order to execute python-based tests first you need to setup the Azure pipeline agent with the required python requirements. These requirements are specified in the repo, setup folder install_requirements.sh script and the text file. However, if your project has more requirements you can define them as per your needs.
Here in this repo it uses scikit-learn library and some other dependancies.
The step in the build piline can be setup to execute the Install_requirement.sh as a bash script task (this model traning uses Azure Pipelines hosted linux (Ununtu 16.04) agent).
Then you can execute the test written in python using a command like below, which will execute the python tests you have written to check you data quality and publish the results. The sample test is available in repo test/unit folder path as unit tests in data_test.py.
pytest tests/unit/data_test.py --doctest-modules --junitxml=junit/test-results.xml --cov=data_test --cov-report=xml --cov-report=html
Then you can publish the test results so that it gets available in the build as test results.
Then as explained in post “Setup MLOPS workspace using Azure DevOps pipeline” you can add Azure ML extension to the agent, create ML workspace and upload data from the repo’s data folder.
Training the model
Once that is done, we can get started training the model. First create two folders to keep metadata and model files in the pipeline agent by executing below command in a bash step.
mkdir metadata && mkdir models
Then you can train the model in the ML workspace by executing the command below. Python code to train the model train_diabetes.py is available in the repo. This user the compute cluster created (in post “Setup MLOPS workspace using Azure DevOps pipeline”) with Azure CLI ML commands. The dependencies to train the model is available in the conda_dependencies.yml file in the repo. Make sure you are setting correct resource group and Azure ML workspace names you have created.
az ml run submit-script --resource-group rg-mldemodev01 --workspace-name mlw-demodev01 --experiment-name diabetes_sklearn --ct cmbamlcompute01 --conda-dependencies conda_dependencies.yml --run-configuration-name train_diabetes --output-metadata-file ../metadata/run.json train_diabetes.py
Once the model training is completed you can register the model in the ML workspace in Azure. While registering the model you can output the model pkl file, which can be used as a build artifact to deploy the trained model in different ML workspaces.
az ml model register --resource-group rg-mldemodev01 --workspace-name mlw-demodev01 --name diabetes_model --run-metadata-file metadata/run.json --asset-path outputs/models/sklearn_diabetes_model.pkl --description "Linear model using diabetes dataset" --tag "data"="diabetes" --tag "model"="regression" --model-framework ScikitLearn --output-metadata-file metadata/model.json
Then you can download the trained and registered to publish it as an artifact in the build pipeline.
az ml model download --resource-group rg-mldemodev01 --workspace-name mlw-demodev01 --model-id $(jq -r .modelId metadata/model.json) --target-dir ./models --overwrite
Then you need to copy all necessary files for deploying the ML model in different Azure workspace in to artifact staging directory, from repo and downloaded files to make sure they are published as build output.
**/metadata/*
**/models/*
**/deployment/*
**/setup/*
**/tests/integration/*
Next step is to publish the artifact staging directory contest as build output.
Once the build executed it will run tests on data, publish the test results.
Then the data is uploaded and model is trained, registered, downloaded and published with repo contents which are required for deployments in target Azure ML workspaces. We will discuss in a next post how to use the build artifacts and deploy the ML model to a different Azure ML workspace.
No comments:
Post a Comment