Home of DataRobot Paxata User Documentation


The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

for release version 2020.2


 New to DataRobot Paxata? Here's where to get going fast to prep your data!


 Want to try a free trial?  Start your Cloud Trial here

RELEASE NOTES: 2021.1


 Getting around in DataRobot Paxata

When you think about Paxata think... 

1. Projects where you do your data prep:

Project Tools are located on the left side

  • Steps editor allows you to view, edit, add, rearrange, and delete Steps within a Project
  • Versions displays the versioning history for a Project.
  • Highlight provides visual queues that help you to better spot patterns, spaces and ranges in your data.
  • Attach provides a Lookup tool to perform a join type operation that combines another dataset with your Base (driving) dataset. Alternatively, you can append another dataset to your Base dataset.
  • Columns provides the ability to edit the Columns in a Project--their names, order, and availability in a Project.
  • Compute allows you to compute columns, using any number of functions, to create a new column.
  • Window functions are a set of tools that enable you to group sets of rows (called a "Window") for the purpose of performing specific functions: Aggregate, Shift, Numbering, Fill.
  • Remove rows from a Project
  • Shape tools allow you to Deduplicate, Group By, Transpose. Pivot and Depivot your data.
  • Auto # adds a new column to your dataset that provides a unique index number for every row in your dataset. This is useful when you need the ability to track your dataset’s original order and/or assign row ID’s to your dataset.
  • Predict tool allows you to send your data to the deployed Machine Learning model in DataRobot and score it.
  • New Lens allows you to create publishing points from Steps in your Project.


2. And think Library where the datasets that you prep are saved:

Library Actions



 Column Operations for your Projects

The following are operations you can perform by clicking the drop-down for any column. Note that most column operations are intuitive and the UI help panel in the application provides a brief definition for each one. For operations that require more explanation, you can find that information by clicking on the links below. 

Filter options

Sort options

  • by ascending
  • by descending

Change options

  • into Capital Case
  • into lower case
  • into UPPERCASE
  • into numeric
  • into text
  • into date
  • into unescaped HTML
  • into blanks
  • custom values

For all of the Change options, you can make the selected change to more than a single column. See the Change Menu for details.

Column options

Whitespace options

  • trim leading and trailing spaces
  • collapse consecutive spaces

For both of the Whitespace options, you can make the selected change to more than a single column. See the Change Menu for details.

Additional column options



 Additional Application Features


  • Automatic Project Flows (APF): allows you to intelligently operationalize curated data flows.

  • Interactive Mode: this feature allows you to define a portion of your datasets to interactively and efficiently work with inside a Paxata Project, and then apply all of the Project's transformations to all of the resulting data through Project Automation.

  • Update a Project's Datasets allows you to choose datasets to refresh in a Project. You can select individual datasets or all the datasets used in the Project. The result is that your Project data is updated to use the current version of the selected dataset(s).

  • Profile a Dataset

  • ClicktoPrep: for BI and visualization tools that support hyperlinks or URLs, you can create links that take you from your tools directly into the Paxata Project Filtergram or Paxata Project Step that generated or modified the data.