(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Snowflake Data Warehouse Connector Documentation

User Persona: Paxata User - Paxata Admin - Data Source Admin/IT


*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

The Snowflake connector allows you to use a JDBC-based connection for Library imports and exports. The following fields are used to create a connection.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple Snowflake accounts and having a descriptive name can be a big help to users in identifying the appropriate data source. If you are a Paxata SaaS customer, please inform Paxata DevOps how you would like this set.

DataBase URI

Database, Schema, and Table Visibility

You can control the database, schemas, and tables that are shown to users when they browse a data source during import. For each, you can choose to:

  • "Show only" which returns only the databases, schemas or tables that you specify here.
  • "Hide" which hides the databases, schemas, and tables that you specify here.
  • "Show all" which is the default setting to display everything in the data source.

When you select the "Show only" or "Hide" options, a field is provided for specifying a comma-separated list of the databases, schemas or tables on which you want the option enforced.

Important: These settings are not enforced when users query against the data source using SQL; query results still return a complete list of matches. For example, if you choose to "hide" a specific database, users can still execute queries that pull data from tables within that database. However, that database will not be displayed to users when they browse the data source.

Import Configuration

  • Query Prefetch Size for queries: The batch size used (in rows) when prefetching data on import.
  • Max Column Size: The maximum size in Unicode characters allowed in a cell when importing or exporting data.
  • Pre-import SQL: A SQL statement to execute before beginning import, after the table's schema is determined. This SQL will also be executed before data is previewed in the Paxata UI.
  • Post-import SQL: A SQL statement to execute after import completes. This SQL will also be executed before data is previewed in the Paxata UI.

Export Configuration

  • Export method: Select the method for exporting data from Snowflake. Both of these methods are specific to how Snowflake imports the data. For more details on these options, please refer to the Snowflake documentation linked behind the listed option. There are two options: 
    • Internal stage: Write data to a file on a Snowflake internal stage before loading the data into a table. This method is recommended for larger datasets since it will be faster than Direct SQL.
      Stage Types: 
      • Temporary: The stage created will be dropped at the end of the session in which it was created. The stage is managed by Snowflake, so no further configuration is required.
      • Permanent: Specifies the name of a Stage that has already been created in Snowflake.
        • Stage Name: specifies the name for an existing named Internal Stage in Snowflake. See the Snowflake Identifier Syntax.
    • Direct SQL: Use SQL insert statements to export data. For larger datasets, this approach is slower than using an internal stage.
      • Export Batch Size: The batch size used when exporting data if the Direct SQL export method has been selected.
  • Max VARCHAR Size: The maximum size allowed for a VARCHAR column allowed on export. Values greater than this size will be replaced with null when data is loaded into the Snowflake table.
  • Automatically create table: Create a new table automatically on export. If enabled, Paxata will drop the table whose name matches the name of the exported dataset, if one already exists, and recreate the table using the exported dataset. If not enabled, Paxata will not create a new table, but will instead load the exported data into the table whose name matches the name of the exported dataset.
  • Pre-export SQL: A SQL statement to execute before beginning export, after the table is created if auto-create is enabled. 
  • Post-export SQL: A SQL statement to execute after export completes.

Note: Paxata exports using the “TIMESTAMP_LTZ(9)“ type. If a table was created using a different timestamp, exporting Paxata data to a column with a mismatching timestamp type will result in an error. The error reads: '"There was an error while performing the export. Reason: SQL compilation error: Expression type does not match column data type, expecting TIMESTAMP_#### but got TIMESTAMP_LTZ(9) for column Column_Name."
To correct this, do one of the following:

  • Allow Paxata to create the table and then perform the export, or
  • Create your table with TIMESTAMP_LTZ(9) and then perform the export.

Credentials

The user authentication can be done through a Shared Account or an Individual Account. If you choose to authenticate with an Individual Account, the user will be prompted to enter a username and password to access this Data Source. If you choose to authenticate with a Shared Account, the following fields are required.

User: The username for a shared account used to authenticate with the database. 

Password: The password for a shared account used to authenticate with the database. 

Role: The role to set for the session using 'USE ROLE'. The specified role should be an existing role that has already been assigned to the user. Roles can also be specified using the JDBC URI, but the value specified in the Role field will supersede a role specified in the URI. Information on Snowflake Roles can be found here: https://docs.snowflake.net/manuals/sql-reference/sql/use-role.html

Data Import Information

Via Browsing

The browsing experience will vary and is based on the database, schemas, and tables visibility settings selected above along with the user credentials supplied.

Via SQL Query

As noted in the database, schemas, and tables visibility section, the limits to what a user can import via query are only limited to their authorization as determined by the credentials supplied for the connection.

Queries can be performed using legal SQL Select statement as defined here: https://docs.snowflake.net/manuals/sql-reference/sql/select.html

Example:

SELECT * FROM "SNOWFLAKE_SAMPLE_DATA"."TPCH_SF1"."CUSTOMER"