(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

Amazon Athena Connector Documentation

User Persona: Paxata User - Paxata Admin - Data Source Admin OR IT/DevOps

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This connector allows you to connect to AWS Athena as an import source. The fields you are required to set up on the data source depends on how the connector was configured by your administrator.

General

  • Name: Name of the data source as it will appear to users in the UI.

  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple AWS Athena instances and having a descriptive name can be a big help to users in identifying the appropriate data source.

Amazon Athena Configuration

  • Athena Region: The hosting region for AWS.

  • Access Key: AWS account access key.

  • Secret Key: AWS account secret key.

Query Results Storage Configuration

  • S3 Bucket Name: The name of the S3 bucket in which Athena will store query results.

  • S3 Object Prefix: Prefix under which Athena will store query results within the specified S3 bucket. See How do I use folders in an S3 Bucket for more information on prefixes.

  • Encryption Type: AWS server-side encryption type.

**Note about query results: When using Athena, each query result will be stored in the configured S3 bucket. This is how Athena is designed to function and is expected behavior. When using Athena to import to Paxata, your query results will be cleaned up by default when the connection closes. The Athena Connector is designed to perform this clean-up task so that you only have one instance of the query result, not two. Should you want the query results from your import to Paxata to remain available in S3, simply run the query in Athena standalone and import the resulting file to Paxata from S3.

Web Proxy Configuration

  • If you connect to AWS Athena through a proxy server, these fields define the proxy details.

    • Web Proxy: 'None' if no proxy is required or 'Proxied' if the connection to AWS Athena should be made via a proxy server. If a web proxy server is required, the following fields are required to enable a proxied connection.

    • Proxy host: The host name or IP address of the web proxy server.

    • Proxy port: The port on the proxy server for Data Source.

    • Proxy username: The username for the proxy server.

    • Proxy password: The password for the proxy server.
      *Leave username & password blank for an unauthenticated proxy connection.

Data Import Information

Via Browsing

Browsing is supported for this Connector and uses Athena queries to generate the browseable hierarchy. Please see the note below about Athena’s cost structure.

Via SQL Query

Please find the SQL reference here

Best Practices

When using Athena, you are charged for each query that you run. The amount that you are charged is based on the amount of data scanned by the query. For more information, see Amazon Athena Pricing.