(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Spark SQL Connector Documentation

User Persona: Paxata User - Paxata Admin - Data Source Admin 

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This connector allows you to connect to Spark SQL for browsing, importing, and exporting available data. The following fields are used to define the connection parameters.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple Spark SQL instances and having a descriptive name can be a big help to users in identifying the appropriate data source.

Spark SQL Server Configuration

  • Spark SQL Server: The hostname or IP address of the server hosting the Spark SQL database.
  • Spark SQL Port: The port for the Spark SQL database.
  • Use SSL: Set this property to the value specified in the 'hive.server2.use.SSL' property of your Hive configuration file (hive-site.xml).
  • Transport Mode: Set this property to the value specified in the 'hive.server2.transport.mode' property of your Hive configuration file (hive-site.xml).
  • HTTP Path: This property is used to specify the path component of the URL endpoint when using HTTP Transport Mode. This property should be set to the value specified in the 'hive.server2.thrift.http.path' property of the Hive configuration file (hive-site.xml).

Spark SQL Server Authentication Configuration

  • User: The username used to authenticate with Spark SQL. For Databricks, set to 'token'.
  • Password: The password used to authenticate with Spark SQL. For Databricks, set to your personal access token (value can be obtained by navigating to the User Settings page of your Databricks instance and selecting the Access Tokens tab).

Data Import Information

Via Browsing

  • View to a table and "Select" the table for import.

Via SQL Query

  • Supports importing using a legal SQL Select Query. 

#FORM FOLLOWS