(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Google BigQuery Connector Documentation

User Persona: Paxata User - Paxata Admin - Data Source Admin - IT/DevOps

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This connector allows you to connect to BigQuery for importing and exporting available data. The fields you are required to set up here depending on how the connector was configured by your administrator.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple BigQuery accounts and having a descriptive name can be a big help to users in identifying the appropriate data source.

BigQuery Configuration

  • OAuth Verifier Key: The verifier key used to authenticate with BigQuery. To obtain the verifier key, click "Test Data Source" and follow the link to grant access to BigQuery. After allowing access, you will be redirected to a page that displays an access code. Copy the code into this field.

  • Profile: The ID of the GCP Project to which you will connect.

  • Automatically Create Table (optional): If enabled, Paxata will drop the table whose name matches the name of the exported dataset, if one already exists, and recreate the table using the exported dataset. If disabled, Paxata will expect that the table is already created and will try to export it.

Google Cloud Storage Configuration for Export

These fields are necessary to perform export to BigQuery. If you intend to only import, you can leave these blank.
Note: They must either both be provided or both left blank. 

  • Google Cloud Storage Bucket Name: Google Cloud Storage bucket name to be used as a staging area for export.

  • Google Cloud Storage JSON Web Token: Content of JSON Web Token (JWT) to be used to connect to Google Cloud Storage.

Web Proxy

If you connect to BigQuery through a proxy server, these fields define the proxy details.

  • Web Proxy: 'None' if no proxy is required or 'Proxied' if the connection to BigQuery should be made via a proxy server. If a web proxy server is required, the following fields are required to enable a proxied connection.
  • Proxy Host: The hostname or IP address of the proxy server.
  • Proxy Port: The port of the proxy server.
  • Proxy Username and Proxy Password: User credentials for an authenticated proxy connection. Leave these blank for an unauthenticated proxy connection.

Data Import Information

Via Browsing

  • View datasets and tables within the project specified in your configuration. The project will appear as the top-level directory in the browsing view.

  • Browse to a table within a dataset and "Select" the table for import.

Via SQL Query

Usage

Each Table name in a query must be single-quoted, with any dot separation occurring outside the single-quotes:

Will work:

Valid Syntax
SELECT * FROM `my-project`.`paxata`.`test`

Will not work:

Invalid Syntax
SELECT * FROM `my-project.paxata.test`