Use

This article helps you learn how to import one or more datasets from your local folders and connected data sources into Paxata’s Library.


Contents


Introduction

Importing data into Paxata®is the first step to getting your data ready for analytics. During the import process you can:

  • Select multiple datasets from a variety of data sources.
  • Combine datasets together into one dataset.
  • Choose which columns in a dataset to import.
  • Select extensionless files.
  • Import datasets from zipped or compressed folder.
  • Change the format used to analyze and structure your data.

Video overview of import

This video provides a high-level overview of import works.


Overview of the Import screen

After you select a dataset for import, the screen divides into four quadrants called panels. The following is an overview of each panel.





ElementDescription
Select Datasets Panel

This is where you select the datasets you want to import. From here you can:

  • Select multiple datasets from local files and connected data sources.
  • Search and query connected data sources for datasets.
  • Combine multiple datasets into one glob for import.
You Selected Panel

After you select a dataset, your dataset is listed in this panel. From here you can:

  • See a list of datasets that have been selected for import.
  • Select a dataset to preview or update the import options.
  • Quickly identify datasets with potential import errors.
  • Change the format used to analyze and structure your data for import.
  • Import the same dataset multiple times with different import options.
Your Options PanelIn many cases your data will import easily into Paxata. In some cases, you may need or want to adjust the import options. This is where you make those adjustments.
Preview PanelWhat you see here is what it is - a preview of your data. As you select datasets from the You Selected panel, update the format, or change import options, your preview displays how the selected dataset will look once imported. From here, you can also choose which columns to import.

Snapshot of the import process

The following is a quick snapshot of how to import your datasets into Paxata.

StepAction
1

On theLibraryscreen, clickImport.

Result:TheImportscreen appears.
2

Select the dataset for import. For more information see the following related sections of this article:

3

Check the preview of the dataset. Does your data look correct:

If ...Then ...
YesRepeat steps 2 and 3 until you have selected all your datasets to import.
NoTry adjusting the import settings. See the Adjust the import settings
section of this article.
4

ClickFinish.

Result:Your data is imported as a dataset and ready to be prepped in a Project.

Select datasets

Datasets can be imported from your local file or a connected data source. This section provides more detail on how to select one or more datasets for import.

Overview of the Select Datasets panel

The following provides information on the elements of the Select datasets panel.



ElementDescription
Datasets ListsThe contents of the data source will list here. In the above example you can see the data source has six items: five comma-separated value (CSV) files and one unknown file type.
Data Source Options

Maybe you need to import a dataset from Amazon S3, Hadoop, JDBC, or some other data source; maybe you just want to import a spreadsheet you saved on your computer. Either way, this is where you do it.

  • The Data Source drop-down list lets you select a configured data source.
  • The Upload local file allows you to select a dataset from your computer.
SearchFor times when you want to find a specific dataset or a group of similar datasets, you can enter search criteria. The Search field accepts wildcard characters. This will help you find specific and similarly named datasets. For more information on searching, see the Search for datasets section of this article.
SelectWhen you see a dataset you want to import, click this button. The dataset will be listed in theYou Selectedpanel and will be imported when you clickFinish.

Select datasets from a data source

Follow these steps to select a dataset from a connected data source.


StepAction
1Select the data source from the Select Data Source drop-down.
2Locate the dataset you want to import. To locate your dataset using search, see the Search for datasets section of this article.
3

To select a dataset, clickSelect.

Result:The dataset is added to the list in the You Selected panel. Paxata displays the Your Options panel for the dataset and a preview of the dataset.
4

To add more datasets:

From ...Then ...
The currently selected data sourceClick any additional dataset you want to include in the import.
A different data sourceRepeat steps 1 - 3 to for each data source.


Result:The additional datasets are added to the datasets list in theYou Selectedpanel.


Select datasets from a from a local file

Follow these steps to import a dataset from a file on your computer or shared network drive.

StepAction
1ClickUpload Local File.
2

To upload a dataset:

  • Click the Upload File panel and select the dataset.
  • Drag-and-drop the file into the Upload File panel.
Result:The dataset is added to list in the You Selected panel. Paxata displays the Your Options panel for the dataset and a preview of the dataset.
3

To add more datasets, click any additional dataset you want to include in the import.

Result:The additional datasets are added to the list in the You Selected panel.

Search for datasets

Datasets can be searched for by typing the name of the dataset or entering a query string. The search is case sensitive and only the results that exactly match your search criteria are returned. Wildcard characters can be used to locate a dataset when you aren’t sure what the exact name is or to locate similarly named datasets. This section provides more information on how to search or query for a dataset.


Search for a dataset

Follow these steps to search for a dataset.

StepAction
1

ClickSearch.

Result:The Wildcard Search field appears.
2

Type your search criteria in theWildcard Searchfield. To search with wildcard characters, see the Wildcard characters section of this article.

Result:Datasets that exactly match your search criteria are returned.

Query a database

Follow these steps to query a database.

StepAction
1

ClickCreate Query.

Result:TheQuery Stringfield appears.
2

Type your search criteria in theQuery Stringfield. To search with wildcard characters, see the Wildcard characters section of this article.

Result:Datasets that exactly match your query string are returned.

Wildcard characters

The following are the wildcard characters you can use to search for datasets.

CharacterMatches
*Any number of characters, including none.
?A single character.
[123] or [abc]A character listed inside the bracket.
[0-9] or [a-z]A character in the range given in the bracket.

Example searches using wildcards

Here are some example searches and the results:

Search exampleReturns
*All the datasets.
*.csvDatasets with a ‘.csv’ file extension.
a?b.csvDatasets that that are named aac.csv, abc.csv, …, azc.csv..
a*z.csvDatasets that begin with a lowercase ‘a’ and end with ‘z.csv’ regardless of what characters or how many characters are between.
a[0-9].csvDatasets that are named a0.csv, a1.csv, a2.csv, …, a9.csv.
a[a-z].csvDatasets that are named aa.csv, ab.csv, …, az.csv.
a[abc].csvDatasets that are named aa.csv, ab.csv, ac.csv.


Combine datasets

Paxata can combine multiple datasets into one glob to be imported. A glob is result of appending multiple datasets into one dataset during import. This section provides more information on how to glob multiple datasets together prior to import.

Guidelines for combining datasets

The following are some guidelines that help make globing datasets a success.

  • Datasets can only be globbed from the same data source.
  • Datasets can only be globbed through a wildcard search.
  • Datasets should have the same structure (number of columns and type of data).

Data sources that support globbing

For a list of data sources and file formats that are supported for globbing, review the Platform Support matrix in the current Release Notes.

Note: To access the Release Notes, you will need the following:
  • Access to the internet.
  • A Paxata Customer Account.

If you do NOT have a Paxata Customer Account, contact the Paxata Service Desk.

Create a Glob

Follow these steps to combine multiple datasets into one glob.

StepAction
1Select the data source from which to import.
2UseSearchto locate the datasets you want to combine. See the Search for datasets section of this article for more information.
3

ClickCombine All Results.

Result:The datasets are combined into one glob. The glob is added to the datasets list in the You Selected panel. The name of the glob defaults to the search criteria. Paxata displays the Your Options panel for the glob and a preview of the glob.

Preview a dataset before import

To change the dataset in the preview, from the You Selected panel, click the dataset you want to preview.

Result:The Preview panel displays the selected dataset.

Note:By default, Paxata displays a preview of the last selected dataset.


Add a dataset again

During import, you may find times when you need to apply different import options to the same dataset. This is especially true when you need to import more than one Excel worksheet from the same Excel file.


Follow these steps to add a dataset with different import options.

StepAction
1

From the You Selected panel, click the More button (three vertical dots) of the dataset you want to add again.

2

Click Add Again.

Result:The dataset is added to the list in the You Selected panel.
3Adjust the import settings as needed. See the Adjust the import settings section of this article for more information.

Adjust the import settings

Once a dataset is selected, Paxata uses analyzes your data to determine the right settings for the best results. But data isn’t a one-size-fits all kind of thing. Sometimes, you need to tweak the settings to get them just right. This section provides information on how to adjust some of the more universal settings of a dataset prior to import. For specific information about a setting, hover your cursor over the help tip(question mark) button.


The following are a few of the frequent and more basic settings you can adjust:

To ...Then ...
Add a tagIn the Your Options panel, type or select the tag from the Tags drop-down.
Add a column to show the source file lineage

In the Your Options panel, toggle the 'Add column to show source file' button.

Result: The new Source File column will be added to the end of the dataset showing the path of the source file for every imported row.

Change the format of the datasetIn the You Selected panel, select the format you want to apply to the dataset from the Format drop-down. See the "Supported Formats" table below for more information.
Change the name of a dataset

In the Your Options panel:

  1. Type the new name in the Name field.
  2. Click Show Preview.
Result: The column name is updated in the preview.
Exclude columns from import

In the Preview panel:

  1. Click Edit Columns.
  2. Deselect the columns you don’t want to import.
  3. Click Show Preview.
Result: The Preview panel displays a preview of the dataset. The deselected columns are removed from the preview.
Import additional worksheets from the same Excel file

For each additional worksheet:

  1. In the You Selected panel, add the Excel file again, see the Add a dataset again section of this article.
  2. In the Your Options panel, select the worksheet to import from the Worksheet drop-down.
Rearrange the columns

In the Preview panel:

  1. Click Edit Columns.
  2. Click the up arrow or down arrow until the column is in the position you want.
  3. Click Show Preview.
Rename a column

In the Preview panel:

  1. Click Edit Columns.
  2. Click Edit (pencil icon) and type the new name.
  3. Click Show Preview.

Supported Formats

For file-based connectors, the common formats are listed in the table below. For information wildcard and globbing support, see “Wildcards and Globbing”. Paxata's Intelligent Ingest identifies the format of the file by looking into the contents of the file instead of relying on the file extension. Even if your file does not have an extension or has an incorrect extension, Paxata will correctly identify the format. 

Common FormatImport support for wildcards and globbing

Delimited files
(CSV, TSV, etc.)

Yes
Fixed-width column dataYes
JSONYes
XMLYes
Apache AvroYes

Microsoft Excel
(XLS, XLSX)

No. See "Wildcards and Globbing"
SAS BDATYes

Paxata supports the import of compressed files in one of the following formats: Deflate, LZ4, Snappy, ZIP, Gzip, or Bzip. In general, the decompressed file must be a common format as listed in the previous table. Additionally, connectors that support Parquet files also support compressed versions of Parquet files.

Note: When importing a ZIP file that contains multiple files, the largest file in the compressed set is automatically identified and selected for import to the Library.



Glossary

The following definitions for terms used in this document.

TermDefinition
AnswerSetLike a dataset except that it is the published result of your data prep.
Base datasetThe data on which all other action in the Project will be performed.
Data sourceThe source of your dataset
DatasetData that is imported into the Data Library is called a dataset.
FiltergramThe combination of the functionality of filters with the power of histograms.
GlobThe result of combining multiple datasets into one dataset during import.