Before adding a repository to Deeploy, the repository must meet certain requirements. It is important to follow these requirements carefully, as not doing so can result in your deployment failing. This article describes the basic knowledge required to prepare a repository, for more in-depth information, see Repository.
Repository Contract
In order to successfully create a Deployment from a Repository, the git repository needs to adhere to a certain format. We call this format the Contract. Specifications for this format can be found below.
repository (or contract path)
|__ model
| |__ reference.json or model.<extension>
|
|__ explainer
|__ reference.json or explainer.<extension>
An example of the use of the Contract in a repository can be found below (example Booster artefact XGBoost).
# this is the Contract as applied in example repository:
# https://gitlab.com/deeploy-ml/sample-models/iris-proba/-/tree/master/model
repository (or contract path)
|_ model
|_ model.bst
...
It is also possible to configure the contract in a sub-folder within your Repository. Please check out this article on how to use it with the Python SDK or consult your Swagger API documentation for the attribute required in the create or update Deployment flow to achieve this. Moreover make sure to have a look at the reserved repository folder names for Deeploy deployments.
Model Folder
The model folder is mandatory and contains your (reference to the) machine learning model. If you do not have this folder, creating a deployment from this repository will fail.
Deeploy expects a model file containing a trained model. See Recommended Framework Versions for the required file type per framework. Make sure you only have one model file or reference only one model file named model.<extension>
. Be aware that if you don't use the exact naming (e.g., you use my_model.<extension>), your deployment will fail.
Explainer Folder
The explainer folder is optional and contains an explainer for your model. For saving and loading explainers, we use the dill
library. This is why explainers generally have the form explainer.dill
. For recommended versions, see Recommended Framework Versions. Make sure you only have one explainer file or reference only one explainer file named explainer.extension
. Be aware that if you don't use the exact naming (e.g., you use my_explainer.<extension>), your deployment will fail.
Reference System and Provenance
For repositories where individual file sizes exceed 100MB, or where the total size exceeds 1GB, we recommend using the reference system. When using the referencing system, the exact model is no longer stored in the git version history. In order to keep the same level of provenance, you should always upload a new file to blob and update your reference when you want to deploy a new version. Do not overwrite files in the blob system.
The reference system works by replacing the contents of the ./model
and ./explainer
folders with a reference.json
that refers to a folder in blob storage or a containerized model/explainer image. Currently, Deeploy supports AWS S3, Azure Blob Storage, and GCS. An example of a reference.json
file can be found below.
# this reference is used in example repository:
# https://gitlab.com/deeploy-ml/sample-models/example-sklearn-census/-/tree/master/model
{ "reference": { "blob": {
"url": "s3://deeploy-examples/sklearn/census/20220801171658/model"
} } }
If the reference system refers to a docker image, an example of the reference.json
file looks as follows.
# this reference is used in example repository:
# ttps://gitlab.com/deeploy-ml/sample-models/example-custom-image-hello-world/
{
"reference": {
"docker": {
"image": "deeployml/example-custom-image-hello-world:0.3.0"
"port": 8080
}
}
}
In case the artefact storage or registry is not publicly available, credentials can be added to the Deeploy Keychain in order to access the private storage locations. Read more about this here:
- Private artefact storage
- Private image repositories
- IAM roles (AWS)
Monorepo: multiple contract paths per repository
In order to deploy multiple models per repository, we implemented a feature called repository contract paths. This feature is available for deployments that use the Python SDK, API and frontend. The idea is that you create your Deeploy contract path at a subfolder in the repository, as shown in the example below.
repository
|
|_subfolder_1
| |_ model
| | |_ reference.json
| |_ explainer
| | |_ reference.json
| |_ metadata.json
|
|_subfolder_2
|_ model
|_ reference.json
To create a deployment from a contract path, you provide the relative path from the repository root as input to the Python SDK or Core API. In the example above "subfolder_1" or "subfolder_2".
Metadata
The metadata.json
can be used to provide metadata about your Deployments. The metadata.json
can be provided in the root of your Repository (or the selected contract folder via the contractPath
attribute), like the structure below.
repository (or contract path)
|__ metadata.json
...
Types of metadata are supported, of which an example can be found in our Public Scikit-Learn Census Repository:
-
featureLabels: Feature labels is an array of strings to provide information about the outcome of your SHAP Explainer. The order of the labels goes hand in hand with the order of the SHAP values. E.g. if 10 SHAP values are expected, 10 feature labels can be provided in the
metadata.json
, where the first feature label corresponds with the first SHAP value. -
predictionClasses: Prediction classes is an object with keys and values to provide information about the predictions of your model. Every key in the object provides a category name for the corresponding model outcome. In the example, a model outcome of
true
means the person is credit worthy, so the key is "credit worthy". This category name is then used to provide labels to our monitoring graph, and will also be displayed in the Deployment details. - ProblemType: the key "problemType" expects a string with value classification or regression. This value is used to calculate the right performance metric based on the type of machine learning problem you model is solving. If the problemType is "classification", the performance is calculated as accuracy. If the problemType is "regression", the performance is calculated as RMSE.
Example of metadata.json is shown below
{
"featureLabels": [
"age",
"workClass",
"education",
"maritualStatus",
"occupation",
"relationship",
"race",
"sex",
"capitalGain",
"capitalLoss",
"hoursPerWeek",
"nativeCountry"
],
"predictionClasses": {
"Credit worthy": true,
"Not credit worthy": false
},
"problemType": "classification"
}
Comments
0 comments
Please sign in to leave a comment.