What you can do with Paxata
Paxata provides a clean, familiar, spreadsheet-like feel. The challenge of prepping data is simplified to single clicks for each action. This provides a point-and-click experience that empowers you to quickly gather data, simply explore and prepare it, and then easily share it.
|Quickly gather all your data into Paxata’s Library||Simply prepare your data in a Paxata Project – with clicks, not code||Easily publish your work as a Paxata AnswerSet™for reliable analytics|
Video overview of Paxata
The below video provides a high-level overview of how you will use Paxata.
Tour the basics of Paxata
Here is an overview of the basic elements of Paxata. These elements are always available and provide quick access to common functions.
|Account Menu||Access account specific options like updating your password or logging out.|
|Help Toggle||Show or hide the Help Panel|
|Help Panel||Get helpful information related to the current screen|
|Navigation Menu||Navigate between the screens used to perform specific actions in Paxata. The primary screens are:|
Note: The screens available to each user are based on the user's permissions
- Library, where you access your imported and published data
- Projects, where you prepare your data
- Admin, where connections to data sources are made and users permissions are controlled
- Automation, where details for automated datasets and Projects are provided
|Notification Bell||Know when Paxata encounters a warning or error|
Gather your data in the Library
dataset. Once you have imported a dataset, you can begin prepping your data in a Project. See the Prep your data in a Project section of this article. When you have finished prepping your data, you can publish it back to the Library as an AnswerSet. See the Share your prepped data as an AnswerSet section of this article.
The Library is where you gather your data. Data, like an Excel spreadsheet, that is imported into the Library is called a
Sources of datasets
Datasets can be imported from local files on your computer or from connected data sources. Some examples of connected data sources are:
- Cloud storage like Amazon S3
- The Hadoop Distributed File System (HDFS)
- Relational databases like MySQL
- Secure File Transfer Protocol (SFTP)
Import a local file
Follow these steps to import a dataset from your computer:
On the Library screen, click + import
Result: The Import Data screen appears
|2||Click + Upload local file |
Result: The Upload local file panel appears
|3||To upload a file,|
Result: The Parsing screen appears. The Parsing displays a preview of how Paxata structured your data
- Click the Upload local file panel and select the file or
- Drag-and-drop the file into the Upload local file panel
Check the preview. Does your data look correct?
|If ...||Then ...|
|Yes||Continue to the next step|
|No||Try adjusting the import options|
|5||Click Finish |
Result: Your data is imported as a dataset and ready to be prepared
Tour the Library screen
Now that there's data in your Library, here are the main sections of the Library:
For complete details of all information provide on this page and the actions you can take, see the Library article.
Prep your data in a Project
A Project is where you explore and prepare your data. The following is an overview of how you can explore and prepare your data in a Project.
|Explore your data using dynamic visuals to highlight patterns, duplicates, blanks, errors and missing data.|
|Clean your data by standardizing values, removing duplicates, finding and fixing errors, and more.|
|Shape your data using tools like pivot, transpose, group by and more|
|Combine additional datasets to enrich your data and provide more context.|
Video overview of Projects
The below video provides a high-level overview of what you do in a Project.
Start a new Project
There are two ways you can start a new Project:
- from the Projects screen where you start an empty Project and then add your data to it
- from the Library where you select the dataset you know you want to use as the starting point for your Project
To start a new Project from the Projects screen:
On the Projects screen, click + add
Result: The Start a New Project window appears
|2||Enter a name for your Project in the Name field|
|3||To save your new Project and start prepping your data|
To start a new Project from the Library:
- Open the Library page and locate the dataset that you want to use as your base dataset for your new Project.
- Hover over that dataset and click the Create Project button that displays:
Tour the Project Preparation screen
Before you begin preparing data, check out some of the useful areas of the Projects screen.
|Project Name||The name you gave your Project will display here. In the above example, the Project name is Example Data Prep|
|TOOLS||These are the tools you will use to clean, shape, combine, and ultimately prep your data. See the Overview of TOOLS section of this article|
|Steps Editor||Every action you perform while prepping your data is logged as a step. The Steps Editor panel allows you to:|
See the Steps Editor article
- View your steps in order
- Mute a step
- Edit what happens during a step
- Rearrange the order of your data preparation steps
- Delete steps
|Versions Panel||Any time you save your Project, a new version is created. The Version Panel gives you access to previous versions of your Project. See the Version History article|
|Data Preview||Simply put, this is your data. You will see your data change as you prep it|
|Column Operations||Opens the Column Operations Menu. These operations will help you clean and standardize your data. See the Overview of Column Operations section of this article|
|Grid Tools and |
|Grid Tools allow to you locate specific columns in your dataset, specify column widths and adjust how cell text displays.|
Status Updates display when transformations that affect the Data Preview grid or filters are in progress. Note: the number of tasks displayed in the update messages may dynamically change as an operation progresses towards completion.
Overview of Column Operations
The following is an overview of the operations you can perform to clean your data.
|FILTER||The FILTER operation combines functionality of filters with the power of histograms. The result is called a Filtergram™. With a Filtergram, you see the relative frequency of each value in a column and select values to temporarily hide some of your data. See the Data Filtergrams article|
|CHANGE||The CHANGE operations allow you to standardize the values in a column. For example, you could change all numbers in a column to numeric values|
|COLUMN||The COLUMN operations allows you to make changes to the column of data. You can do things like:|
- Split values into multiple columns based on a delimiter character or a given number of characters. See the Split Column article
- Find and replace specified values in the column
- Duplicate the column
|WHITESPACE||The WHITESPACE operations allow you to remove leading and trailing spaces as well as extra spaces within your data|
|OTHER||The OTHER operation is Cluster + Edit. This operation allows you to find values in a column that are similar and edit them so they are the same |
Example: Before using Cluster + Edit, Apple Computer appears in a column as "Apple Computer", "Apple Corporation", and "Apple Computer Corporation". After using Cluster + Edit all instances for Apple Computer can be standardized to "Apple Computer"
See the Cluster and Edit article.
The following is an overview of the tools you will use to prepare your data.
|highlight||The more data you have, the harder it can be to notice small details. The highlight tools provide visual cues to help you see:|
|attach||When you need to add additional datasets to your Project, use the attach tool. Rows of data can be added to the bottom of your Project. If your datasets have a matching column of data, the additional data can be combined with the data in the Project. See the Lookup article|
|columns||Sometimes you may want to make minor adjustments to your columns. The columns tools let you:|
- Edit your column names
- Rearrange your column order
- Remove columns
|compute||There may be a time when you need to write an expression. Maybe you want to concatenate data from multiple cells into one value, or perform mathematical operation based on data. The compute tool is how you do that. See the Computed Columns article|
|remove||Part of cleaning data is removing information that is not needed. The Remove tool lets you remove rows of data. See the Remove Rows article|
|sampling||You may find it useful to work with a sample of a dataset before bringing all the data into your Project. For large datasets, this can make initial exploration and discovery easier. The sampling tool also gives you the flexibility to filter down to a specific set of rows in your data, and then sample on the remainder|
|shape||Change shape of your data using the Shape tools. With these tools, you can:|
See the Data Shaping Tools article
- Group data
|auto #||The auto # tool assigns each row a number. This is helpful if you need to give each row a unique identifier|
|new lens||Lenses create publishing points from Steps in your Project. When you publish from a Lens, the resulting AnswerSet is a snapshot at a particular Step in your Project. The AnswerSet is saved to the Library. Lenses are also essential for Project Automation because they define the publishing points to use for automated jobs. See the Project Lenses article|
Share your prepped data as an AnswerSet
AnswerSet. An AnswerSet is like a dataset. The difference is an AnswerSet is the published result of your data prep. Once published, you can reuse the AnswerSet in other Projects or export the AnswerSet to share with other applications. Your published AnswerSet is always published to the Library. AnswerSets can also be created at any time and for any set of specific Steps in your Project using a Lens.
When you're ready to save and share the data you prepped, publish it to the Library as a
Publish an AnswerSet
Follow these steps to publish an AnswerSet:
|1||Click steps in the TOOLS menu |
Result: The Steps Editor panel opens
|2||Click the step you want to publish an AnswerSet from |
Note: Paxata defaults to the last step in the Project
|3||At the top of the Steps Editor, click Publish |
Result: The Publish AnswerSet to Library window appears
|4||Enter a name for the AnswerSet in the Name field|
|5||Click Publish |
Result: The Publishing AnswerSet message appears. Paxata publishes an AnswerSet using the steps up to and including the selected step. The AnswerSet is published to the Library
Export your prepped data
Datasets and AnswerSets can be exported out of Paxata. Exporting amplifies your ability to get the most out of your data.
Export your AnswerSet to your computer
Follow these steps to download an AnswerSet or dataset locally:
|1||On the Library screen, hover your mouse over the AnswerSet you want to export|
Click the Export button that displays
Result: The Exporting screen appears
|3||Click Download file locally |
Result: The Export Settings screen appears
|4||Click Export |
Result: The AnswerSet is downloaded to your computer as a comma separated values file. The Export Logscreen appears
The following definitions for terms used in this document.
|AnswerSet||Like a dataset except that it is the published result of your data prep|
|Base dataset||The data on which all other action in the Project will be performed|
|Data source||The source of your dataset|
|Dataset||Data that is imported into the Library is called a dataset|
|Filtergram||The combination of the functionality of filters with the power of histograms|