(warning) The Data Prep (Paxata) documentation is now available on the DataRobot public documentation site. See the Data Prep section for user documentation and connector information. After the 2021.2 SP1 release, the content on this site will be removed and replaced with a link to the DataRobot public documentation site.

SFTP Connector Documentation

User Persona: Paxata User - Paxata Admin - Data Source Admin - IT/DevOps

*Note: This document covers all configuration fields available during Connector setup. Some fields may have already been filled out by your Admin at an earlier step of configuration and may not be visible to you. For more information on Paxata’s Connector Framework, please see here.

Also: Your Admin may have named this Connector something else in the list of Data Sources.

Configuring Paxata

This Connector allows you to connect to an SSH File Transfer Protocol (SFTP) Server for Library imports and exports. The following fields are used to define the connection parameters.


  • Name: Name of the data source as it will appear to users in the UI.
  • Description: Description of the data source as it will appear to users in the UI.

Something to consider: You may connect Paxata to multiple SFTP servers and having a descriptive name can be a big help to users in identifying the appropriate data source. If you are a Paxata SaaS customer, please inform Paxata DevOps how you would like this set.


If the SFTP Host section appears in the Add Source/Edit Source form, provide the information used to locate and connect to the SFTP host.

  • SFTP Hostname: You can use either the fully qualified hostname, including the domain name, or the IP address of the SFTP server.
  • SFTP Port: The socket port for the SFTP server. The protocol specifies port 22 as default.
  • Automatic Host Key Verification: Automatically accept the host key from the SFTP server.
    • Selected: This option enables SFTP Connector to automatically trust connections to your SFTP server. This is equivalent to setting StrictHostKeyChecking=no in SSH.
    • Deselected (default setting): This option disables automatic trust configuration to SFTP HOSTNAME. This is selected as the default option as it represents the higher security configuration.
  • Keep Alive: Enable/Disable session activity to prevent a timeout.
    • Selected (default setting): Enables periodic background communication between SFTP Connector and SFTP server to keep the connection from being closed by the server during browse, import, and export. 
    • Deselected: The duration of the connection is managed by the SFTP server configuration. Idle connections may be terminated by the server. In this configuration, it is best to avoid lapsed inactivity when browsing to import/export data. 
  • Data Compression: Enable data compression during transfer.
    • Selected (default setting): Enables ZLIB compression of data during transfer between the SFTP server and Paxata, resulting in an increase in transfer speed for most datasets. In the event that ZLIB compression cannot be negotiated between Paxata and server, the connection will fall back to uncompressed transfer automatically.
    • Deselected: Disables ZLIB compression of data during transfer between SFTP server and Paxata
  • Socket Timeout Seconds: The number of seconds to wait for SFTP command execution (list directory, create directory, logout...). The default value is 30 seconds. To allow for longer wait, increase this value.
    • This option will most likely be used when the SFTP server directories contain very large lists of data files. 


  • Root Directory: Defines the top-level directory to be presented in Paxata's browse interfaces for import and export. Users can see files and directories within this directory in the browsing interface.


The SFTP connector can authenticate using password authentication or SSH keys (with or without a Passphrase). Here are the options:

  • User Credentials: This is a username and password combination.
    • USERNAME: The username for authenticating with the SFTP server
    • PASSWORD: The password associated with the provided username
  • SSH Key Without Passphrase: This option only requires that you paste in the SSH Key
    • USERNAME: The username for authenticating with the SFTP server 
    • SSH PRIVATE KEY: The contents of the SSH private key associated with the username
  • SSH Key With Passphrase: Paste in the SSH Key and enter the Passphrase 
    • USERNAME: The username for authenticating with the SFTP server 
    • SSH PRIVATE KEY: The contents of the SSH private key associated with the username
    • PASSPHRASE: The encryption passphrase for your Private Key

Data Import & Export Information

Via Browsing

  • The Connector will present a browsable directory hierarchy starting at the location defined in the ROOT DIRECTORY field. 
  • The Connector also supports Wildcard & Glob importing, this enables users to import multiple SFTP data files into Paxata as a single Dataset. 

Via SQL Query

  • As SFTP is a file store, SQL Queries are not supported for this data source. 

Technical Specs

  • We test this Connector against a standard, non-configured Linux implementation of OpenSSH 

FAQ/Troubleshooting/Common Issues

  • Please note that SFTP is as much a protocol as it is a type of storage. If you have an “SFTP Server”, what you really have is a storage location that interfaces with the web using the SSH File Transfer Protocol. This is an important distinction as anything (web services, SFTP service providers, etc) can expose data to the web using this protocol. These services might be using different implementations of SFTP or they may do things behind the scenes that a traditional SFTP Server would not. All this is to say that SFTP servers may have custom behavior that presents challenges either in connecting or importing data.

    Here’s one example of where this type of variance from standard SFTP caused some challenges: A customer was using the SFTP Connector to pull data from one of their vendors. The vendor was using a service that exposed data via SFTP, but would then delete each datafile after being read. When Paxata provides a preview of data upon import, this is done by querying the data source for a small chunk of the data present. This caused the system to delete the file before it could be fully imported.