(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

MS Azure Data Lake Storage (ADLS) Connector Documentation

User Persona: Paxata User - Paxata Admin - Data Source Admin - IT/DevOps

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This connector allows you to connect to Azure Data Lake Storage (ADLS) for Library imports and exports. The following fields are used to define the connection parameters.

General

  • Name: Name of the data source as it will appear to users in the UI.

  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple Azure Data Lake Storage accounts and having a descriptive name can be a big help to users in identifying the appropriate data source.

Azure Data Lake Storage Configuration

  • ADL URI: The URI for the ADL site.
  • Root Directory: Specifies the top-level of the directory structure from which import/export of data is enabled. 
  • Application ID: The application ID for the ADL site.
  • OAUTH 2.0 Token Endpoint: The OAUTH 2.0 Token Endpoint for the ADL site. 
  • Application Access Key Value: The Application Access Key Value for the ADL site. See Issue 1 in the FAQ/Troubleshooting/Common Issues section below for more information.

FAQ/Troubleshooting/Common Issues

  • Q: Can we have both ADLS Gen1 and ADLS Gen2 Connectors in the same Paxata account?
    A: Yes! The two Connectors can coexist and will not interfere with each other.
  • Issue 1: When you Test Connection, it fails and reports a "base64" issue.
    How to fix it: In March 2020, Azure changed the format of the Application Access Key Value. The new format does not work for authentication, so you'll need to use the Azure command line to set the Base64 encoded version of the Application Access Key Value. In the Azure Portal:
    1. Create a new Application Access Service Account.
    2. Copy the generated Access Key value.
    3. Base64 encode the password.
      1. Mac example: echo -n '<password>' | openssl base64
      2. Windows: use a tool like Base64 Encoder
    4. Create an Azure Command to reset the password to a Base64 encoded version of the password:
      1. az ad sp credential reset --name <Name or app ID of the service principal> --credential-description "<description>" --append --years 2 -p "<Base64 encoded password from step 3>" -o=jsonc
    5. Open a Command prompt in the Azure Portal and paste the command from Step 4.
    6. Ensure that this service account has appropriate ACLs for the storage. You will receive ACL errors if permissions are not appropriate.
    7. Configure Application Access Key Value using the new Base64 encoded password.