(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

JDBC Connector Documentation

User Persona: Paxata Admin - Data Source Admin

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This Connector enables the ability to import and export data leveraging JDBC Drivers. Typically this connector is leveraged for import/export against relational databases, but many applications offer JDBC drivers. The following fields are used to define the connection parameters.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple similar JDBC sources and having a descriptive name can be a big help to users in identifying the appropriate data source. 

Database URI 

  • JDBC URI: The JDBC connection string as defined by the driver being used. For more details on connection string options, please see the documentation for the driver you intend to use. JDBC connection strings typically take the form:
    jdbc:<driver>://<host>:<port>:<sid>/<service/DB_Name>:<Option1=Value1>;<Option2=Value2>

Database Visibility

You can control the database, schemas, and tables that are shown to users when they browse a data source during import. For databases, schemas, and tables, you can choose to:

  • "Show only" which returns only the databases, schemas or tables that you specify here.
  • "Hide" which hides the databases, schemas, and tables that you specify here.
  • "Show all" which is the default setting to display everything in the data source.

When you select the "Show only" or "Hide" options, a field is provided for specifying the databases, schemas or tables on which you want the option enforced.

Important: These settings are not enforced when users query against the data source; query results still return a complete list of matches. For example, if you choose to "hide" a specific database, users can still execute queries that pull data from tables within that database. However, that database will not be displayed to users when they browse the data source.

Import Configuration

  • Query Prefetch Size: The number of rows per batch during import. 
    • Please note that a larger batch size will increase throughput for large imports, but setting this to too large a value can cause the Connector process to run out of memory. The default value for this field is 10,000 rows per batch. The amount of memory granted to a Connector process varies greatly depending on the size of your Paxata installation and your Paxata Administrator should be consulted before setting this field to anything greater than the default.
  • Max Column Size: The maximum length, in Unicode characters, for any column. Values larger than this will be replaced with 'null'.
  • Pre-Import SQL: SQL to be executed before import process. This SQL may execute multiple times (for preview and import) and could be multiple SQL statements, newline-delimited. 
  • Post-Import SQL: SQL to be executed after import process. This SQL may execute multiple times (for preview and import) and could be multiple SQL statements, newline-delimited. 
  • Perform Count Query: This selector allows users to prevent the Connector from executing count queries upon import, which can be very slow on some database tables. To disable counting the number of rows while importing, set this to "False".

Please Note: As the Pre- and Post-Import SQL may be executed multiple times throughout the import process, please take care when specifying these values in the Connector/Datasource Configuration as they will be executed for every import performed with this configuration.

Export Configuration

  • Export Batch Size: The number of rows per batch during export. 
    • Please note that a larger batch size will increase throughput for large exports, but setting this to too large a value can cause the Connector process to run out of memory. The default value for this field is 10,000 rows per batch. The amount of memory granted to a Connector process varies greatly depending on the size of your Paxata installation and your Paxata Administrator should be consulted before setting this field to anything greater than the default.
  • Max VARCHAR Size: The maximum width for a VARCHAR column.
    • Please note this connector will attempt to export a column using the CLOB type if the maximum VARCHAR width is exceeded and the database supports to CLOB type.
  • Automatically Create Table: ENABLED | DISABLED
    • ENABLED: Paxata will automatically create a new table when exporting a data set. If a table exists, Paxata will drop the existing table before creating a new table with the same name.  
    • DISABLED: Paxata will not automatically create a new table when exporting a data set. The Connector assumes that a table exists that matches the exported dataset's name & format. The exported data will be appended to the existing table.
  • Pre-Export SQL: SQL to be executed before export process. This SQL will execute once and could be multiple SQL statements, newline-delimited. 
  • Post-Export SQL: SQL to be executed after export process. This SQL will execute once and could be multiple SQL statements, newline-delimited. 

Credentials

  • User: Username used to access the data source.
  • Password: Password used to access the data source.
  • Role: Some applications allow you to specify a role when connecting. Enter your role value here if required, or this field may be left blank.

Data Import & Export Information

Import Via Browsing

  • Browse Databases, Schemas and/or tables based on your configuration settings.  "Select" the table for import. 

Import Via SQL Query

  • Requires a SQL Select Query for your database.
  • Example: SELECT * FROM "SAMPLE_DATA"."TPCH_SF1"."CUSTOMER"

Export Information

  • Browse Databases and/or Schemas based on your configuration settings
  • There may be errors if you attempt to export directly under a catalog and not a schema when the database supports catalogs, schemas, and tables.
  • (Optional) Edit the name used for the table name.

Tier 2 Support for JDBC

Non-SaaS customers of Paxata are able to supply and install their own drivers in for use with the JDBC Connector. This functionality is not currently available for SaaS customers.

Paxata has a test kit that can help assess whether a given JDBC driver will be a good fit for use in Paxata. The Paxata Customer Success team will run this test kit on your behalf. 

Note: Success with this tool does not certify the supplied JDBC driver for official support. Drivers that pass tests using this tool are not guaranteed to work with current or future versions of the Paxata application. The JDBC Test Kit is not a comprehensive JDBC test suite tool.

What does “Tier 2” mean?

If the JDBC Connector Test Kit passes all tests, Paxata supports usage of the combination of:

  • A specific Paxata JDBC connector version and a specific Paxata Core Server version. These numbers generally match the Test Kit version, but this is not always the case.
  • A specific database/application version.
  • A specific JDBC driver version.
  • The datatypes tested.
  • Only the successfully tested features - There are currently three: Import via query, Import via browse, and Export.

What else does this mean:

  • Paxata does NOT certify the database/application for JDBC or other connector connectivity.
  • Paxata does NOT explicitly or formally test against the database/application.
  • Paxata does NOT test new versions nor Service Packs for potential regressions of Tier 2 data sources.
  • Paxata does NOT make Tier 2 JDBC sources available in Paxata Cloud. Only the certified (Tier 1) JDBC sources are available in Paxata Cloud.

How do I set it up?

If you have a JDBC driver you would like to use with Paxata, please follow these steps for using it under the JDBC Connector as a Tier 2 data source: 

  1. Start by reaching out to your Customer Success representative and provide the driver you would like to use, they can test the driver for you using the test kit.
  2. If the test is successful, the Customer Success representative will help you install the driver in the correct directory on the Paxata Core Server and add the driver to the registry.
  3. From there, you can refer to your driver documentation and the details above to configure the JDBC Connector to work with your new driver.