The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.
User Persona: Paxata User - Paxata Admin - Data Source Admin
*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.
Also: Your Admin may have named this Connector something else in the list of Data Sources.
This connector allows you to connect to a REST API to import a REST Resource. The following is information on the parameters used to create the connector.
Name: Name of the data source as it will appear to users in the UI.
Description: Description of the data source as it will appear to users in the UI.
Something to consider: You may use the REST API Connector to connect Paxata to multiple sources, and potentially multiple instances of the same source, and having a descriptive name can be a big help to users in identifying the appropriate data source.
If you connect to your REST API source through a proxy server, these fields define the proxy details.
Web Proxy: 'None' if no proxy is required or 'Proxied' if connection to the REST Endpoint should be made via a proxy server. If a web proxy server is required, the following fields are required to enable a proxied connection.
Proxy host: The host name or IP address of the web proxy server.
Proxy port: The port on the proxy server for Data Source.
Proxy username: The username for the proxy server.
Proxy password: The password for the proxy server. *Leave username & password blank for an unauthenticated proxy connection.
REST API Configuration
In this section, provide the information used to locate the REST API resource.
Base URL: Base URL of the REST API. The base URL must include the protocol (http/https), hostname (port number is optional) and context path.
Resources: Multiple REST resources to be imported. Each line should contain a single REST resource configuration in name:path?query format.
The name is the user-visible name for the resource to be imported and is required for a REST resource configuration. This name will be presented in the Browse user-interface, for example: Account Details.
The path is the path to the resource and is required for a REST resource configuration. This path should start with a slash (/) and optionally has multiple segments separated by a slash (/), for example: /resource/sub-category
The query is an optional filtering criterion to use while retrieving the resource and is optional for a REST resource configuration. The query syntax must be key=value pairs delimited by '&', for example: criteria=active&order=desc or jql=status=done.
REST API Authentication Configuration
In this section, provide the information used to authenticate to the REST API service endpoint.
Authentication Type: Select one of the options based on your requirement.
No Auth: if the REST API doesn't require any authentication.
Basic Auth: if the REST API allows authentication with Username and Password.
Bearer Token: if the REST API allows authentication through Bearer Token. In the case of Bearer Token, each web service may provide access to or the generation of tokens differently and the web service’s documentation should explain how to find it.
Username and Password: If Authentication Type is selected as Basic Authentication, these fields are provided for authentication. Some web services only require one field or the other, so while most will require both fields, the configuration page allows them both to be blank. This may cause an error while authenticating to the data source, but will not cause form validation errors when saving the Data Source.
Bearer Token: If Authentication Type is selected as Bearer Token, this needs to be provided for authentication. The user must know how to obtain this token as every system will handle this differently. Obtaining this token may also require Administrator help.
REST API Test Connection & Operation Configuration
Test Connection & Operation Method: The HTTP method used in a request to determine if the Paxata connector can connect to the REST API service and what method will be used when the Connector requests a resource. Selecting "Automatic" will try `HEAD`, `GET` and `POST` to test the connection and is the best option if you're unsure which method to select.
The selected method is also used for actual import, if ‘GET’ or ‘HEAD’ succeeds in the test, then ‘GET’ will be used for import, if ‘POST’ is successful, then ‘POST’ will be used for the actual import.
Connection Timeout: Timeout (in milliseconds) for connecting to REST API.
Data Import Information
Will present the Resources in the import workflow as the importable data set using the Resource Name as defined in the Resource list.
For paginated REST responses, each paginated response contains HTTP Headers that identify the URL for the next page of results.
When a paginated dataset is requested, the REST Connector will automatically identify that the dataset is paginated and follow data links.
Automatically extract the HTTP link for the next page of data.
Return the results from the current page of results.
Execute a call to obtain the next page of results.
During Import using the Paxata UI, we present only 1 page of data values in order to allow for rapid presentation of the Preview as well as to reduce hits against rate-limited APIs.
During Import, the Connector will:
Automatically extract the HTTP link for the next page of data
Return the results from the current page of results
Execute a call to obtain the next page of results.
The performance of the REST Connector is very dependent upon the implementation of the REST API that it leverages.
Best performance is found for REST APIs that support returning an entire dataset per REST API invocation. This is typical of APIs that leverage chunked transfer encoding. In this scenario, the REST Connector executes a single API call to obtain a full dataset.
REST API's that leverage pagination reduce performance by requiring additional REST API calls.
Pagination style: RFC 5899
Each response contains N records and an HTTP Header containing a URL that points to the next batch.
Review your REST API documentation to identify the maximum page size that you can configure in order to reduce the number of API calls.
Example: GitHub REST API
APIs can have rate limitations. When importing large datasets that are paginated, it is not uncommon to run into limitations on the number of REST calls made within a window of time. For example, GitHub allows for 5000 requests per hour and Google Drive allows for 1000 requests per 100 seconds.
Is OAuth authentication supported?
Not at this time. Currently, only username/password and token authentication methods are supported. Many data sources only allow for OAuth authentication and those sources would be unsupported at this time. Please contact your Paxata Client Success if you find that you are unable to connect to a DataSource for this reason.
What do the “Test Connection” messages mean?
Test Connection verifies that each entry in the Resource List matches the expected format.
Failure of an entry to match the expected format results in an error indicating the identified format issue and the entry number.
Failure to use unique Dataset Name for each Resource entry results in a format validation failure.
After testing confirms proper formatting of resources, only the first entry on the Resource List is used to verify the connectivity is configured correctly.
The following are real-world examples of how the REST API Connector has been used. Please feel free to use any of these in your account, but please be advised that companies may change their APIs at any time, sometimes without notice, and these are not fully supported data sources. This means that Paxata will be unable to help you troubleshoot any issues you may have with these configurations and they may become out of date.
Simple Learning Example
There are many simple, unauthenticated REST API resources on the web that were created with the expressed purposes of learning, testing, and prototyping. One of those sources is JSONPlaceholder . This example may be over simplified, but is intended to demonstrate the building blocks of how to connect to a RESTful web service using the REST API Connector.
There are no rate limits posted and no pagination as the datasets are small.
Example JQL resource: run a Jira JQL query to retrieve 200 To-Do task Items
JQL query had to be URL encoded before pasting into configuration.
Connector To Do Tasks:/search?jql=Project%3DyourProject%20and%20statusCategory%3D%22To%20Do%22&maxResults=200
Authentication: Basic (username/password)
Pagination for Jira
Jira does not support RFC 5899 pagination. In order to support pagination for JIRA:
Define Datasource Resource entries that specify pages of data:
Use "maxResults=100" to maximize the number of entries per REST call
Use "startAt=N" to specify the starting point. N starts at 0
Example: 4 pages of 100 search results
All Issues 0:/search?jql=&startAt=0&maxResults=100 All Issues 1:/search?jql=&startAt=100&maxResults=100 All Issues 2:/search?jql=&startAt=200&maxResults=100 All Issues 3:/search?jql=&startAt=300&maxResults=100
Use Paxata's Wildcard feature to select all pages of data
Wildcard pattern = "All Issues*"
After JSON parsing, which flattened the JSON and duplicated some rows to account for subtasks we obtained 557 rows of data X 298 columns
US Census Data Example
The Census data web site is not a REST API, but we can use our REST API Connector to retrieve data over HTTP