(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Find and Replace

Overview

Find and Replace allows you to locate and replace text within specified columns. You can also find and replace across multiple or all columns. See Find and Replace across multiple columns later in this article for details. The purpose of this article is to explain the steps that are required for a find and replace operation, and to provide you with some simple examples of how to optimize this feature for your data.


Steps for Find and Replace


  1. Find field: specify a value to find. Alternatively, double-click a cell that has the value you want to find; the fields for find and replace are then automatically populated with the value from that cell.

  2. Using: specify how to match the text that you specify in the "find" field. There are four matching methods:
    • Contains: the match can be anywhere in the cell value.
    • Starts With: the match must be at the beginning of the cell value.
    • Equals: exact match.
    • Ends With: the match must be at the end of the cell value.

  3. Replace: specify what portion of the match to replace. There are two options for replacing matching text:
    • the entire cell: replaces the entire contents of the cell.
    • the match: replaces only the matched portion of the cell.

Example 1: original cell value = "123456"
Find: "123" and replace with Match "321" and the result is: "321456"

Example 2: original cell value = "123456"
Find: "123" and replace Entire Cell with "321" and the result is: "321"

4. Save your change for this operation. Or, if you want to continue making replacements and save all of the find and replacement transformations to the current Step,don't save here but, instead, click the plus (+) to continue. See the next section for how multiple transformations operate on your data.


PaxPro tip: in screenshot above, the "Save" button is displayed in theFind and Replacemenu because the Steps panel is turned off. Typically, one turns the Steps panel off to provide more real estate when working with data on the grid. For convenience, the Save button then displays on the operations screen as displayed above. When you enable the Steps panel, the Save button is located there instead—the same "Save" button, just a different location:


Multiple Find and Replace operations

You can continue with additional find and replace operations for the selected column(s) by clicking the plus button, which opens another set of find and replace fields. Each additional find and replace operation is iterative—meaning that a subsequent find operates on the results of the previous find and replace operation, and all of the transformations are applied in order, from top to bottom as they are displayed in the Find and Replace panel.

With the hospital example, we contained with the transformations because we wanted everything on the same Step in our Project. The next transformation we made was to find all instances of "memorial" and then replace the entire cell—not just the matched text—with a null value:


Another example:

find "Detroit" and replace with "San Francisco"
find "San Francisco" and replace with "San Jose"
The result is that "Detroit" is transformed into "San Jose"

Note that you can rearrange the sequence of your transformations at any time by dragging and dropping a find and replace operation to another position in the panel.



To save all find and replace transformations for this Step, click Save. Or to remove any find and replace operation for this Step, click the minus button for that operation.


Important notes for Find and Replace

  • by default, capitalization is ignored in the search. For example, the text cat matches the text CaT. If capitalization is important, deselect the Ignore case checkbox.

  • by default, you are limited to 250 find and replace operations in a single Project Step. If you exceed this number, an error message is displayed in the Step editor panel and you cannot continue find and replace transformations on the column. Contact your system administrator if you need to increase the limit.

  • the Highlight feature on the grid is only enabled for a single Find and Replace transformation; it is disabled if you add another Find and Replace transformation.


Find and Replace across multiple columns

Do you need to perform a quick search and replace across your entire dataset—or even across just a specific set of columns in your dataset? If so, then the advanced find and replace tool is the right tool for you to get this job done. If you're still not sure this is the right tool for what you need to do, here are some common examples of when Paxata users perform an advanced find and replace operation:

  • the dataset has both "incorporated" and "Inc" everywhere. You want to standardize the entire dataset to have only the "Inc" value.
  • the dataset has "incorporated" everywhere, and for the most part that's accurate. But you need to change the value to "Inc" for some specific columns in the dataset.
  • you've pulled two datasets into your Project—one has "NA" and the other uses blanks to represent non-applicable values. You want to change all of the "NA" values into blanks.

The advanced find and replaced menu is found under the same column menu for find and replace:

  1. Open the Find + Replace menu from any column's drop-down menu.
  2. Click the drop-down adjacent to the column's name in the Find + Replace On field.
  3. The advanced column picker menu opens.



In the advanced column picker menu, click the check box adjacent to the columns that you want to select for the Find and Replace operation. The steps for Find and Replace across multiple columns are the same as the steps for replacing a single column. The UI help panel provides these steps. In addition, from the advanced menu, you can select multiple columns by either Name or Criteria.


The selection options are significant because they determine how your data will be dynamically updated with the change:

  • by Name: applies the replace operation only to the specific columns you select.
    To select columns byName:
    • click the check box adjacent to the column(s) that you want to select.
    • click the top-most check box to select all columns.
    • use the Columns and Types filters at the top of the panel to quickly filter down to the columns you want to select for the operation.
    • use the search function to locate a column by name.

  • by Criteria: applies the replace operation to any column that meets the criteria you specify. For example, if you have String type columns in your dataset and you specify the replace operation for String type columns, then all existing columns of this type in your dataset—and any new String type columns that are introduced to the dataset prior to this Step—will be dynamically replaced.
    To select columns based on criteria:
    • optionally specify the data type of the column—Boolean, DateTime, Number or String.
    • optionally specify the pattern for the column name—contains, starts with, equals or ends with.
    Notice the header message updates to indicate the number of columns you have selected based on that criteria. You may later notice the number of selected columns increases or decreases if new data is brought into an earlier Step that introduces or removes columns that meet your criteria.
When you're finished selecting your replace options, click Save in the Steps panel.

Note: if you switch between the Name and Criteria options before saving the replace operation, your selections are remembered and a link to "Restore last selection" returns you to your initial selection method.