(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Google Cloud Storage Connector Documentation

User Persona: Paxata User - Paxata Admin - Data Source Admin

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This connector allows you to connect to Google Cloud Storage (GCS) for browsing and importing objects. The following fields are used to create a connection to the data source.

General

  • Name: Name of the data source as it will appear to users in the UI.

  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple GCS accounts and having a descriptive name can be a big help to users in identifying the appropriate data source.

Google Cloud Storage Configuration

  • Bucket Name: A Google Cloud Storage bucket represents a collection of objects in Google Cloud Storage.

  • Object Prefix: Prefix is a folder/sub-folder in the bucket. Select the prefix you want to use in the bucket. Default value to view all objects is "/".

  • JSON Web Token: JSON Web Token for Google Cloud Storage is required for authenticating the account. Provide the JWT file content for establishing a secured connection with Google Cloud Storage. For more details on the JWT, see the Google documentation for Using OAuth 2.0 for Server to Server Applications.

Web Proxy Configuration

  • If you connect to Google Cloud Storage through a proxy server, these fields define the proxy details.

    • Web Proxy: 'None' if no proxy is required or 'Proxied' if the connection to Google Cloud Storage should be made via a proxy server. If a web proxy server is required, the following fields are required to enable a proxied connection.

    • Proxy host: The host name or IP address of the web proxy server.

    • Proxy port: The port on the proxy server for Data Source.

    • Proxy username: The username for the proxy server.

    • Proxy password: The password for the proxy server.
      *Leave username & password blank for an unauthenticated proxy connection.

How to Authenticate with Google

The Paxata Google Cloud Storage Connector leverages Service Account authentication.

In order to access Google Cloud Storage using Paxata you must: 

  1. Create a Google Service Account for the Cloud Storage service:
    1. Open the list of credentials in the Google Cloud Platform Console.
      1. https://console.cloud.google.com/apis/credentials
    2. Click Create credentials
    3. Select Service account key
      1. A Create service account key window opens.
    4. Click the drop-down box below Service account, then click New service account
    5. Enter a name for the service account in Name
    6. Choose a Cloud Storage Role that grants the service account the desired level of access.
    7. Use the default Service account ID or generate a different one.
    8. Select the Key type: JSON
    9. Click Create
      1. A Service account created window is displayed and the private key for the Key type you selected is downloaded automatically
      2. Remember the downloaded credential location
    10. Click Close
  2. Download the JSON credential for an existing Service Account for the Cloud Storage service:
    1. Log in to the Google Console using the end-user account: https://console.cloud.google.com/apis/credentials
    2. Ensure that the correct Project is selected in the dropdown list.
    3. Scroll down to the "OAUTH 2.0 client IDs" section.
    4. Click the Name of the existing ID that you plan to use in the Connector.
    5. On the resulting page, click the "DOWNLOAD JSON" link.
    6. Remember the downloaded credential location.

For additional reference, please see:  https://cloud.google.com/storage/docs/authentication#generating-a-private-key

Data Import Information

Via Browsing

Browse directories and files within the configured Bucket/Prefix