(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Predict Tool for scoring your data

When you have data that needs to be scored against a deployed Machine Learning model in DataRobot, the Predict tool is how you generate the score. To generate the score:

  1. Select the predict tool and provide your DataRobot API token, which is used to retrieve a list of your DataRobot deployments. Note: your token is obtained from the DataRobot application under your user profile option for "API Key Management".

2. Select the deployment—your data will be scored against the model in this deployment. If the model used for scoring is a Time Series Model, then you must indicate this by checking the Time Series Model checkbox, and then go to the Options tab to specify the Forecast Point and, optionally, the Series Id—both of these options are explained below.
Note: deployments for custom models are not currently supported.

By default, the new column for the prediction score will be created as “Target” in the dataset. To change this name, click the Options tab and provide a different name in the “Prediction Column” field.

After you select the deployment, the prediction runs. The new column is created and provides the prediction score. In addition, the “Target Prediction Value” column is also generated to provide the associated prediction value for each score. For multiclass predictions, the prediction values are returned per classification. For example, if classifying images into “apple”, “orange” or “pear”, then three additional columns are returned--one value for each corresponding score.

Examples of use case prediction values:

  • Predict the probability that a hospital patient may be readmitted after discharge. The prediction column will contain a binary value of 1 or 0 to indicate if the patient is likely to be readmitted or not readmitted.

  • Classify a set of images into one of three fruits: oranges, pears or apples. The prediction column will contain one of three values: orange, pear, apple.

  • Forecast sales based on forecast dates. The prediction column in this case will contain the sales dollar amount.

For binary and time series prediction deployments, the Options tab provides additional options, which are explained below.

Options

For Times Series predictions, you must also provide the Forecast Point, which is the point you are making a prediction from; a relative time “if it was now…” DataRobot trains models using all potential forecast points in the training data. In production, it is typically the most recent time.
IMPORTANT: the format of this date must be ISO 2014-08-12T00:00:00Z

Optionally, if your dataset has multiseries data, for example a dataset that contains multiple time series to forecast the sales for multiple stores, then you can specify a column as a Series Id to group the data and return the predictions, separately, for each group.

For Binary predictions, the Options tab provides Prediction Explanations that help you to understand why a prediction was returned, for example, “Why did this patient score a 1 for possibility of readmission”; “why was this image identified as an apple".

When Explanations are enabled, five new columns are generated, per explanation, in the Project:

  • feature: the name of the feature contributing to the prediction.

  • feature value: the value the feature took on for the row.

  • strength: the amount this feature's value affected the prediction.

  • qualitative: a human-readable description of how strongly the feature affected the prediction. For example: ++++; -; +

  • label: describes what output was driven by this prediction explanation. For regression projects, it is the name of the target feature. For classification projects, it is the class whose probability, if increased, would correspond to a positive strength of this prediction explanation.

Additionally, Low and High Threshold values can be set so that explanations are only generated for scores outside of the threshold.

See Prediction Explanations for complete details on values returned for Predictions.