Import Datasets
Use
This article helps you learn how to import one or more datasets from your local folders and connected data sources into Paxata’s Library.
Contents
- Introduction
- Select datasets
- Overview of the Selects Dataset panel
- Select a dataset from a data source
- Select a dataset from a local file
- Search for datasets
- Combine datasets
- Preview a dataset before import
- Add a dataset again
- Adjust the import settings
- Glossary
Introduction
Importing data into Paxata®is the first step to getting your data ready for analytics. During the import process you can:
- Select multiple datasets from a variety of data sources.
- Combine datasets together into one dataset.
- Choose which columns in a dataset to import.
- Select extensionless files.
- Import datasets from zipped or compressed folder.
- Change the format used to analyze and structure your data.
Video overview of import
This video provides a high-level overview of import works.
Overview of the Import screen
After you select a dataset for import, the screen divides into four quadrants called panels. The following is an overview of each panel.
Element | Description |
---|---|
Select Datasets Panel | This is where you select the datasets you want to import. From here you can:
|
You Selected Panel | After you select a dataset, your dataset is listed in this panel. From here you can:
|
Your Options Panel | In many cases your data will import easily into Paxata. In some cases, you may need or want to adjust the import options. This is where you make those adjustments. |
Preview Panel | What you see here is what it is - a preview of your data. As you select datasets from the You Selected panel, update the format, or change import options, your preview displays how the selected dataset will look once imported. From here, you can also choose which columns to import. |
Snapshot of the import process
The following is a quick snapshot of how to import your datasets into Paxata.
Step | Action | ||||||
---|---|---|---|---|---|---|---|
1 | On theLibraryscreen, clickImport. Result:TheImportscreen appears. | ||||||
2 | Select the dataset for import. For more information see the following related sections of this article: | ||||||
3 | Check the preview of the dataset. Does your data look correct:
| ||||||
4 | ClickFinish. Result:Your data is imported as a dataset and ready to be prepped in a Project. |
Select datasets
Datasets can be imported from your local file or a connected data source. This section provides more detail on how to select one or more datasets for import.
Overview of the Select Datasets panel
The following provides information on the elements of the Select datasets panel.
Element | Description |
---|---|
Datasets Lists | The contents of the data source will list here. In the above example you can see the data source has six items: five comma-separated value (CSV) files and one unknown file type. |
Data Source Options | Maybe you need to import a dataset from Amazon S3, Hadoop, JDBC, or some other data source; maybe you just want to import a spreadsheet you saved on your computer. Either way, this is where you do it.
|
Search | For times when you want to find a specific dataset or a group of similar datasets, you can enter search criteria. The Search field accepts wildcard characters. This will help you find specific and similarly named datasets. For more information on searching, see the Search for datasets section of this article. |
Select | When you see a dataset you want to import, click this button. The dataset will be listed in theYou Selectedpanel and will be imported when you clickFinish. |
Select datasets from a data source
Follow these steps to select a dataset from a connected data source.
Step | Action | ||||||
---|---|---|---|---|---|---|---|
1 | Select the data source from the Select Data Source drop-down. | ||||||
2 | Locate the dataset you want to import. To locate your dataset using search, see the Search for datasets section of this article. | ||||||
3 | To select a dataset, clickSelect. Result:The dataset is added to the list in the You Selected panel. Paxata displays the Your Options panel for the dataset and a preview of the dataset. | ||||||
4 | To add more datasets:
|
Select datasets from a from a local file
Follow these steps to import a dataset from a file on your computer or shared network drive.
Step | Action |
---|---|
1 | ClickUpload Local File. |
2 | To upload a dataset:
|
3 | To add more datasets, click any additional dataset you want to include in the import. Result:The additional datasets are added to the list in the You Selected panel. |
Search for datasets
Datasets can be searched for by typing the name of the dataset or entering a query string. The search is case sensitive and only the results that exactly match your search criteria are returned. Wildcard characters can be used to locate a dataset when you aren’t sure what the exact name is or to locate similarly named datasets. This section provides more information on how to search or query for a dataset.
Search for a dataset
Follow these steps to search for a dataset.
Step | Action |
---|---|
1 | ClickSearch. Result:The Wildcard Search field appears. |
2 | Type your search criteria in theWildcard Searchfield. To search with wildcard characters, see the Wildcard characters section of this article. Result:Datasets that exactly match your search criteria are returned. |
Query a database
Follow these steps to query a database.
Step | Action |
---|---|
1 | ClickCreate Query. Result:TheQuery Stringfield appears. |
2 | Type your search criteria in theQuery Stringfield. To search with wildcard characters, see the Wildcard characters section of this article. Result:Datasets that exactly match your query string are returned. |
Wildcard characters
The following are the wildcard characters you can use to search for datasets.
Character | Matches |
---|---|
* | Any number of characters, including none. |
? | A single character. |
[123] or [abc] | A character listed inside the bracket. |
[0-9] or [a-z] | A character in the range given in the bracket. |
Example searches using wildcards
Here are some example searches and the results:
Search example | Returns |
---|---|
* | All the datasets. |
*.csv | Datasets with a ‘.csv’ file extension. |
a?b.csv | Datasets that that are named aac.csv, abc.csv, …, azc.csv.. |
a*z.csv | Datasets that begin with a lowercase ‘a’ and end with ‘z.csv’ regardless of what characters or how many characters are between. |
a[0-9].csv | Datasets that are named a0.csv, a1.csv, a2.csv, …, a9.csv. |
a[a-z].csv | Datasets that are named aa.csv, ab.csv, …, az.csv. |
a[abc].csv | Datasets that are named aa.csv, ab.csv, ac.csv. |
Combine datasets
Paxata can combine multiple datasets into one glob to be imported. A glob is result of appending multiple datasets into one dataset during import. This section provides more information on how to glob multiple datasets together prior to import.
Guidelines for combining datasets
The following are some guidelines that help make globing datasets a success.
- Datasets can only be globbed from the same data source.
- Datasets can only be globbed through a wildcard search.
- Datasets should have the same structure (number of columns and type of data).
Data sources that support globbing
For a list of data sources and file formats that are supported for globbing, review the Platform Support matrix in the current Release Notes.
Note: To access the Release Notes, you will need the following:- Access to the internet.
- A Paxata Customer Account.
If you do NOT have a Paxata Customer Account, contact the Paxata Service Desk.
Create a Glob
Follow these steps to combine multiple datasets into one glob.
Step | Action |
---|---|
1 | Select the data source from which to import. |
2 | UseSearchto locate the datasets you want to combine. See the Search for datasets section of this article for more information. |
3 | ClickCombine All Results. Result:The datasets are combined into one glob. The glob is added to the datasets list in the You Selected panel. The name of the glob defaults to the search criteria. Paxata displays the Your Options panel for the glob and a preview of the glob. |
Preview a dataset before import
To change the dataset in the preview, from the You Selected panel, click the dataset you want to preview.
Result:The Preview panel displays the selected dataset.
Note:By default, Paxata displays a preview of the last selected dataset.
Add a dataset again
During import, you may find times when you need to apply different import options to the same dataset. This is especially true when you need to import more than one Excel worksheet from the same Excel file.
Follow these steps to add a dataset with different import options.
Step | Action |
---|---|
1 | From the You Selected panel, click the More button (three vertical dots) of the dataset you want to add again. |
2 | Click Add Again. Result:The dataset is added to the list in the You Selected panel. |
3 | Adjust the import settings as needed. See the Adjust the import settings section of this article for more information. |
Adjust the import settings
Once a dataset is selected, Paxata uses analyzes your data to determine the right settings for the best results. But data isn’t a one-size-fits all kind of thing. Sometimes, you need to tweak the settings to get them just right. This section provides information on how to adjust some of the more universal settings of a dataset prior to import. For specific information about a setting, hover your cursor over the help tip(question mark) button.
The following are a few of the frequent and more basic settings you can adjust:
To ... | Then ... |
---|---|
Add a tag | In the Your Options panel, type or select the tag from the Tags drop-down. |
Add a column to show the source file lineage | In the Your Options panel, toggle the 'Add column to show source file' button. Result: The new Source File column will be added to the end of the dataset showing the path of the source file for every imported row. |
Change the format of the dataset | In the You Selected panel, select the format you want to apply to the dataset from the Format drop-down. See the "Supported Formats" table below for more information. |
Change the name of a dataset | In the Your Options panel:
|
Exclude columns from import | In the Preview panel:
|
Import additional worksheets from the same Excel file | For each additional worksheet:
|
Rearrange the columns | In the Preview panel:
|
Rename a column | In the Preview panel:
|
Supported Formats
For file-based connectors, the common formats are listed in the table below. For information wildcard and globbing support, see “Wildcards and Globbing”. Paxata's Intelligent Ingest identifies the format of the file by looking into the contents of the file instead of relying on the file extension. Even if your file does not have an extension or has an incorrect extension, Paxata will correctly identify the format.
Common Format | Import support for wildcards and globbing |
---|---|
Delimited files | Yes |
Fixed-width column data | Yes |
JSON | Yes |
XML | Yes |
Apache Avro | Yes |
Microsoft Excel | No. See "Wildcards and Globbing" |
SAS BDAT | Yes |
Paxata supports the import of compressed files in one of the following formats: Deflate, LZ4, Snappy, ZIP, Gzip, or Bzip. In general, the decompressed file must be a common format as listed in the previous table. Additionally, connectors that support Parquet files also support compressed versions of Parquet files.
Note: When importing a ZIP file that contains multiple files, the largest file in the compressed set is automatically identified and selected for import to the Library.
Glossary
The following definitions for terms used in this document.
Term | Definition |
---|---|
AnswerSet | Like a dataset except that it is the published result of your data prep. |
Base dataset | The data on which all other action in the Project will be performed. |
Data source | The source of your dataset |
Dataset | Data that is imported into the Data Library is called a dataset. |
Filtergram | The combination of the functionality of filters with the power of histograms. |
Glob | The result of combining multiple datasets into one dataset during import. |