(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

MS Azure Synapse Analytics Connector Documentation

User Persona: Paxata User - Paxata Admin - Data Source Admin or  IT/DevOps

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This connector allows you to connect to an Azure Synapse Analytics for Library imports and exports. For export, the connector uploads data into Azure Data Lake Service (ADLS) and then exposes the data as a table in the SQL Data Warehouse.

The following is information on the parameters used to create the connector.

General

  • Name: Name of the data source as it will appear to users in the UI.

  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple Azure Synapse Analytics accounts and having a descriptive name can be a big help to users in identifying the appropriate data source. If you are a Paxata SaaS customer, please inform Paxata DevOps of how you would like this set.

Database URL

The Database URL is where you provide the Java Database Connectivity (JDBC) connection string. This string tells Paxata where the database is for import and export. You can include the schema name in the URL.

Visibility Settings

You can control the database, schemas, and tables that are shown to users when they browse a data source during import. For databases, schemas, and tables, you can choose to:

  • "Show only" which returns only the databases, schemas or tables that you specify here.
  • "Hide" which hides the databases, schemas, and tables that you specify here.
  • "Show all" which is the default setting to display everything in the data source.

When you select the "Show only" or "Hide" options, a field is provided for specifying the databases, schemas or tables on which you want the option enforced.

Important: These settings are not enforced when users query against the data source; query results still return a complete list of matches. For example, if you choose to "hide" a specific database, the query results will still include that database in the returned results. However, that database will not be displayed to users when they browse the data source.

Import Configuration

The import configurations let you specify how data is imported into Paxata.

  • Query Fetch Size: Number of rows per batch to use when fetching data on import.
  • Pre-import SQL: SQL statement to execute before beginning import, after the table’s schema is determined.
  • Post-import SQL: A SQL statement to execute after import completes.

Export Configuration

The export configurations let you specify how data is exported out of Paxata.

  • Pre-export SQL: a SQL statement to execute before beginning export, after the table is created if auto-create is enabled.
  • Post-export SQL: a SQL statement to execute after export completes.
  • External Data Source Name: Name of the external data source in SQL Data Warehouse for accessing data in Azure Data Lake.

NOTE: The maximum VARCHAR size is 8000 characters. Values exceeding 8000 characters are exported as an empty string.

Credentials

The credentials settings allow you to specify a single user account to authenticate with when connecting to the data source.

Azure Data Lake Configuration

The Azure Data Lake Configuration is where you provide the settings Paxata needs to connect to your Azure Data Lake.

  • ADL URI: The URI for the ADL site.
  • Root Directory: Specifies the top-level of the directory structure from which import/export of data is enabled.
  • Application ID: The application ID for the ADL site.
  • OAUTH 2.0 Token Endpoint: The OAUTH 2.0 TOKEN ENDPOINT for the ADL site.
  • Application Access Key Value: The Application Access Key Value for the ADL site.

Data Import Information

Via Browsing

Browse to a table and "Select" the table for import. Supported data formats:

  • Delimited datasets: comma, tab…
  • XML
  • JSON
  • Excel: Xls and XLSX
  • Avro
  • Parquet
  • Fixed format

Via SQL Query

Using a legal SQL Select Query

Data Export Information

For export, the connector uploads data into Azure Data Lake Service (ADLS) and then exposes the data as a table in the SQL Data Warehouse. Once data is loaded in ADLS, the connector will create a new external table - dropping any pre-existing table with the same name - using the exported data's schema.

Browse to a table and "Select" the table for export.

  • Only one export format is available: JDBC
  • (Optional) Edit the name used for the Azure Synapse Analytics table name