(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Hortonworks HDP2 HDFS Connector Documentation

User Persona: Paxata Admin - Data Source Admin - IT/DevOps

Availability: Please note this Connector is not available to Paxata SaaS customers.

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This connector allows you to connect to a Hortonworks HDP 2.6.5 Hadoop File System (HDFS) for import and export. The following fields are used to define the connection parameters.

Note: Configuring this Connector requires file system access on the Paxata Server and a core-site.xml with the Hadoop cluster configuration. Please reach out to your Customer Success representative for assistance with this step.

General

  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple HDFS Clusters and having a descriptive name can be a big help to users in identifying the appropriate data source. 

Simple Configuration (only for Simple authentication)

  • Username: The application web server will connect to your HDFS cluster as the username you provide here.

Configuration

  • Data Store Root Directory: The ’parent directory’ on your cluster where the Connector will read from and write to for import and export operations. This also supports import and export for sub-directories of the root.

Kerberos Configuration

The following parameters are required for Kerberos authentication. 

  • Principal: Kerberos Principal.
  • Realm: Kerberos Realm.
  • KDC Hostname: Kerberos Key Distribution Center Hostname.
  • Kerberos Configuration File: Fully-qualified path of Kerberos configuration file on webserver.
  • Keytab File: Fully-qualified path of Kerberos Keytab File on webserver.

The Proxy User and Use Application User options allow you to specify the account to impersonate. See this documentation for more information about impersonation in HDFS. You have three options here: use a specific proxy user, a proxy user with modifiers, or the individual application user.

  • Proxy User: Here you can either specify the user account that will be impersonated for all connections or check the Use Application User box to impersonate the user account of the individual Paxata user who runs the connector. Note that the Proxy User field is not enabled if Use Application User is checked. Entering ${user.name} as the proxy user works similarly to selecting Use Application User but allows for more flexibility because you can add modifiers or additional text. For example:
    • To add a domain to the user’s credentials, enter \domain_name\${user.name} in the Proxy User field. Paxata will pass the username and the domain.
      • Example: \Accounts\${user.name} results in Accounts\Joe (assuming Joe is the username).
    • To apply a text modifier to the username, add .modifier to the key ${user.name}. The acceptable modifiers are: toLower, toUpper, toLowerCase, toUpperCase, and trim.
      • Example: ${user.name.toLowerCase} converts Joe into joe (assuming Joe is the username).

Data Import Information

Via Browsing

Supported

Via SQL Query

Not supported