(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Update Project Datasets

Introduction

When you add a dataset into your Project (either as the base dataset, through a look up, or an append), you are identifying a specific version of the dataset to use in your Project. If newer versions of the datasets become available in the Library, your Project doesn’t automatically use the newer versions because the work you've done in your Project, and the subsequent results, may depend on the specific dataset versions you initially selected.

Many times, this works great. Other times, you may want to update the Project datasets with newer versions. There are two methods for updating a Project’s datasets. The following are the methods:

  • Refresh a Project dataset to the newest version of an existing dataset.
  • Replace a Project dataset with another dataset.


Overview of refreshing a dataset

Refreshing a dataset updates the Project data to use the most current version of a dataset.

For example, if you start your Project with a dataset that is version one, and over time newer versions of the dataset are imported into the Library (either through manual import or automation,) you have the option to refresh the dataset in your Project to use the newest version.

When a dataset can be refreshed

A dataset can be refreshed when:

  • there is a newer version of the dataset in the Library.
  • the Interactive Mode feature is enabled and the interactive portion size for datasets has changed.

When a dataset can be refreshed in a Project, you are provided visual cues. The following are the cues and descriptions.

ElementDescription

The Refresh Datasets button is located on the Steps panel.

  • Green indicates a newer version of one or more datasets used by your Project is detected.
  • Gray indicates there are no newer versions of your Project’s datasets.

The Use Latest button is located on the Refresh Datasetspanel. When green, it indicates there is a newer version of that dataset.


The file details link opens a Version Information panel that allows you to quickly determine the number of new rows and columns in the dataset's latest version. If your Project is in Interactive Mode and the dataset contains more rows than the interactive portion, you will also see a column for Interactive that lists the number of rows you can bring into the Project. This number is important because it allows you to quickly determine the if the Interactive portion has been increased or decreased, and then subsequently determine if you do want to refresh the dataset.

Note: all Paxata Projects have a maximum Project row limit that is set by the Paxata System Administrator. If you are close to reaching that limit, and your Administrator cannot increase it, you can selectively choose which datasets to update with latest versions so that you can continue bringing newer data into your Project without exceeding the Project row limit.


If you deselect a dataset, the Use Latest button turns dark gray. This indicates that there is a newer version of the dataset and that you have chosen not to update the dataset.

When there are no new versions for the dataset, the Use Latest button is light gray.


Refresh the datasets

Follow these steps to refresh a dataset to the latest version.

StepAction
1

In the project, click Steps.

Result: The Steps panel opens.

2

Click Refresh Datasets.

Result: The Refresh Datasets panel appears. All datasets that can be refreshed are selected by default.

3

Select the datasets to refresh.

To select ...Then ...
All refreshable datasetsClick All or just skip to the next step (all the datasets that can be updated are selected by default).
Individual datasetsEither deselect the dataset you don’t want to refresh or click None and select the datasets you do want to refresh.
3

Click Save.

Result: The project data is updated to the most current versions of the selected datasets.


Overview of replacing a dataset

Unlike refreshing the data, replacing a dataset lets you decide which dataset or which specific version of a dataset to use in your Project. For example, if you started a Project with version one of a dataset and five additional versions were imported, replacing a dataset lets you pick the exact version to use, which may not be the latest version. Replacing the dataset also gives you the power to entirely change which dataset is being used in your Project.

Replace a dataset

Follow these steps to replace a dataset used in a project.

StepAction
1

In the project, click Steps.

Result: The Steps panel opens.

2Click the step with the dataset you want to update.
3

In the Steps panel, click Edit.

Result: The project returns to the state it was in when the selected step was created.

4

Above the Data Preview panel, click the name of the dataset you want to update.

Result: The Select Datasets screen appears.

5

Select the dataset you want to use.

To select a ...Then ...
Previous version of a dataset
  1. On the dataset, click All Versions.
  2. On the version you want to use, click Select.
Different dataset in your LibraryOn the dataset, click Select.
6

Click Save.

Result: The projects data is updated.


Glossary

The following definitions for terms used in this document.

TermDefinition
AnswerSet™Like a dataset except that it is the published result of your data prep
Base datasetThe data on which all other action in the Project will be performed
Data sourceThe source of your dataset
DatasetData that is imported into the Data Library is called a dataset
FiltergramThe combination of the functionality of filters with the power of histograms