(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Library Layout

The Library page is where you go to add new datasets to your Library and manage existing ones, including the Paxata AnswerSets that are published out from your Projects. This is also where you go to export datasets, set them up for automation, add new versions, create profiles for your datasets, and view any warnings or errors that occurred when a dataset was imported.

This article reviews the various actions you can take to organize your datasets in the Library and the actions you can take for each dataset. Note: the content in this the article is ordered based on a top-to-bottom view of the Library page.

Add a new dataset
To add a new dataset to your Library, click the plus button located at the top, left corner of the page. You'll be taken to the import screen to select a data source and dataset(s) to import. If there are any errors during the import, a red warning icon displays adjacent to the dataset's listing on this page. For more information on the error(s), mouse over the dataset's name and click the "Edit Details" button.

Search for a dataset on the page
To search for a dataset in your Library, click the magnifying glass icon in the top, right corner of the page. A field opens where you can begin typing the name of the dataset you want to locate. Potential matches will be displayed as you continue to type.

Filter the datasets displayed on the page
Use the following filters at the top of the page to filter the list of datasets displayed on the page:

  • Show versions: toggle to display all versions of every dataset and AnswerSet or only the latest version of each.
  • Creation time: last seven or 30 days.

  • Ready for Use? this filter only displays when the Interactive Mode feature is enabled for your Paxata Projects. It allows you to quickly see which datasets have finished loading their interactive portions and are ready for use in a Project.

  • Completed? displays all datasets that have successfully finished importing into the Library.

  • Owner: displays the datasets and AnswerSets that you have imported and created from Paxata Projects.

  • Data Source: click in this field to display all of the data sources used to import datasets. You can select more than one data source by continuing to click and select.

  • Tags: click this this field to display all of the tags currently assigned to datasets in the Library. To locate a dataset by a specific tag, type the tag name followed by the Enter key. If you add multiple tags to search, only the datasets containing all of the search tags are returned as matches.
    (Tags are descriptive words that enable you to further organize your datasets. To add a new tag for a dataset, hover over the dataset and click inside the "add tags" field for that dataset.)

Columns on the Library page
You can create a new column display order for the Library page by clicking a column's header and dragging it to a new location. To sort the Library list by a particular column, click that column name. To sort on multiple columns, hold shift and click additional columns. Note: the display and sort orders you create are temporary and will not be retained when you leave the Data Library page or refresh your browser.

    • Name: displays the name of the dataset when it was imported into the Library. To change the name, hover over the dataset and click the Edit Details button. The metadata page for the dataset opens. Follow the directions in the UI help panel to change the name and update other metadata details.

    • Type: allows you to quickly identify which datasets in your Library are AnswerSets that were created from Paxata Projects. If the Interactive Mode feature is enabled for your Paxata Projects, AnswerSets will be represented with the partial icon to indicate that you were working in Interactive Mode when it was created.

    • Version #: displays the number of versions for each dataset or AnswerSet. When there is more than one version, you'll see a "Show all versions" link in the column. Clicking the link drills into all versions of the dataset. Keep in mind, if you search tags while drilled into the versions page, your search only applies to the versions page and not the entire Library page.
      Note: version numbers do not necessarily correspond to the actual number of those datasets in the Library. Conditions under which a version number will not match the exact number of those datasets in the Library: when an import is canceled before it completes, a version number is automatically generated and subsequent imports will simply be incremental version number additions; when a particular version of a dataset is deleted, the version numbers for the remaining datasets are not decremented.

    • Status: describes a dataset's load status as it's being imported into the Library. In most cases, the status will quickly progress to "completed." However, for larger datasets, you will see interim states to indicate that your dataset is continuing to successfully import. The interim states you may see also depend on:

      • if the row count for the dataset can be predetermined prior to import. In most cases, Paxata knows the number of rows in a dataset before the import process even begins. However, there are cases in which that count cannot be predetermined—for example imports from Salesforce and queries on JDBC data sources. See the movies below for examples of how row count is reflected in the status loading icons.

      • if Interactive Mode is enabled for your Projects. When interactive mode is enabled, you'll notice the status icon has two concentric circles. The inner circle represents the interactive portion of your dataset. When the interactive portion is ready to be used in a Project, the inner circle becomes a green check mark. The outer circle will then begin to fill green as the remainder of the dataset continues to load into the Library. See the movie below for an example of this progression. If any errors occur while importing the interactive portion, or the remainder, a red warning icon displays in the respective concentric circle to indicate which part of the dataset failed to import into the Library. See below for examples of the failure state icons.

        The following describe and illustrate the various loading states you may see on the Library page.


                    


                   

 Failure states

 Interactive mode not enabled: dataset failed to import.

 Interactive mode: interactive portion did not successfully import.

 Interactive mode: interactive portion successfully completed but remainder of dataset failed to successfully import.

Note: you may see a "Pending" state in this column if you did not finish selecting the parsing options for the dataset. In this case, you will also see a "Click to Finish" button in the Created column. Click the button to open the import screen and finish the import.

  • # of Rows: displays the number of rows in a dataset. You can preview rows from a dataset by moving your mouse over the dataset and clicking the "show preview" link that displays in this column. When a dataset is currently in the import process, and the row count is predetermined, the number displayed in this column continues to increase until the import is finished. If the dataset fails to successfully import, the number of rows that did successfully import are listed in this column. In this case, “show preview” displays a preview of those rows.

  • Tags: tags are labels that you can add to your datasets to help organize your data. To add tags to a dataset, click in the Tags column for that dataset, type a tag name and click the "Add" link that displays, or press the Enter key.

  • Created: displays the user who imported the dataset and when it was imported. You may see a "Click to Finish" button in the column. This indicates the import was never initiated because the parse options were not finalized. Click this button to return to the import screen and finish the import process for the dataset. 

Actions you can take for a dataset

Three buttons that appear when you hover over a dataset provide you with the options you can take for that dataset.

  • Create Project: create a new Project using the dataset as your base dataset.

  • Export: export or locally download a dataset. See the UI help and Export a Dataset for details.

  • More actions: provides a number of additional options, depending on the features that are enabled for your Paxata application.
    • Edit Details: opens the dataset's metadata page. This is where you can update the dataset's name and description. See the UI help (after opening the metadata page) for details of actions you can take and an explanation of the metadata. This is also the page where you view warnings or errors that may have occurred during import. Datasets with warnings or errors are easy to locate in the list: they are flagged with a warning icon adjacent to the dataset name, the row color for the dataset is red, and the Status icon indicates a failure state.
    • Add a new version of the existing dataset without overwriting the current version.
    • Automate the dataset (if this automation feature enabled.)
    • Profile the dataset (if profiling feature enabled.)
    • for any AnswerSet, open a Project at the precise Step from which that AnswerSet was created.



Glossary

The following definitions for terms used in this document.

Term
Definition
AnswerSet

Like a dataset except that it is the published result of your data prep

Base datasetThe data on which all other action in the Project will be performed
Data sourceThe source of your dataset
DatasetData that is imported into the Data Library is called a dataset