(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Paxata Connector Setup Overview

What are Paxata Connectors?

Every story at Paxata starts and ends with Connectors. Being able to do Data Preparation is only valuable if you can get the data you need to prep and then can send that data where you need it after it’s been prepped. Paxata Connectors are the tools for getting data into and out of Paxata.

Benefits of Paxata Connectors

Business-User-Friendly Data Access

Accessing data on disparate systems isn’t very complicated for coders; most databases, file stores, web services, etc have well-developed, code-friendly interfaces that adhere to industry standards. 

Data Integration is Hard for Non-coding Users

At Paxata, we’ve devoted a lot of resources to tackling this problem and opening up as many data sources as possible to non-coding users. 

  • As one user shared with us, before Paxata, “Data on Hadoop is as good as data on the moon”.
  • Our goal is that a business analyst (non-coding user) can access any data in the organization they are authorized to use.

Browsing vs. Querying

One core aspect of enabling non-coding users is the browsing interface. Where other data prep or ETL solutions rely on SQL queries, every data source in Paxata can be browsed and data can be imported with clicks. 

Control & Governance

The business environment is significantly more fluid than IT infrastructure typically accommodates, but still, certain people should only have access to certain information and should only be able to send that information to certain places. The Connector framework allows large and complex organizations to ensure users can access only the information granted to them and can be configured simply for smaller organizations where speed and self-service are a priority.

Setup of Paxata Connectors

Three Layers of Configuration

When setting up a Connector, there are three hierarchical levels of configuration, from highest to lowest: “Connector,” “Data Source,” and “Session.” If a field is filled out at a higher level, it won’t need to be filled out again downstream. Some fields may be alterable at a later stage, but that varies greatly across the Connectors. 

Connector Configuration:

This level is typically created and managed by an Admin or IT and it exists to:

  • Make a given Connector available to specific groups of users
  • Allow an administrator to enter information that users won’t know and/or that will be the same across all users/data sources that rely on the Connector Config.
  • It also allows an Admin to keep sensitive information secure from users who shouldn’t have access, e.g. an SSH Key.
Data Source Configuration:

This level is typically created and managed by either individual users or admins, depending on how access to the source system data is being managed and it exists to:

  • Contain all persistent configuration not already captured at the Connector Config level.
    • Typically, this includes everything except for user credentials supplied at runtime for a shared Data Source Config.
Session Configuration:

This level is almost exclusively managed by individual users or ignored if not required and it exists to:

  • Capture information at runtime of import/export.
    • Typically, this is limited to user credentials.

Sharing Controls

  • Connector & Data Source Configs can be shared with groups within your tenant.
  • These sharing controls also allow you to specify if members of the specified groups can Read, Update, or Delete the configuration and whether the users may perform imports and/or exports with the configuration.

Example Setups

The following are a few examples of business situations and how the Connector Framework can be set up to accommodate the needs of each team.

Example 1:

Business Situation: IT-managed SFTP Server authenticated by “SSH Key with Passphrase” where the key and passphrase are held by IT and several teams will need access to different directories.

Setup: 

  • Connector Config
    • IT will create one Connector Config and fill out SFTP Host & Port, SSH Key & Passphrase.
    • Sharing: None
  • Data Source Config
    • Create a new Data Source for each team, specify the appropriate Root Directory.
    • Sharing: Share each fully-configured Data Source as Read-only with the corresponding team and allow imports & exports if appropriate.
  • Session Config
    • N/A

Benefits of this approach:

  • If the credentials change, they only need to be managed in one place.
  • IT can manage credentials and keep them private from users.
  • Each team has the access they need without having to manage access control on the data source itself. 

Example 2: 

Business Situation: Admin managed Salesforce Org where each user should access only the information they have permissions for in Salesforce and each user will need to run automation jobs within Paxata.

Setup: 

  • Connector Config
    • The Salesforce Admin will create one Connector Config and fill out all relevant information except for User & Password
    • Sharing: Share this Config with each relevant group as Read-only
  • Data Source Config
    • Each user should create their own Data Source config and fill out just their credentials so their setup persists and can be used in automation jobs
    • Sharing: None
  • Session Config
    • N/A

Benefits of this approach:

  • Admin level setup is completed by the admin and each user must only enter their username and password, the information they should have readily available
  • Each user’s authorization is managed in Salesforce