Automation

IMPORTANT:  this is a legacy feature that provided you the option to automate individual Projects and datasets. Automatic Project Flows, introduced in the 2019.1 release, allows you to intelligently operationalize curated data flows. The new APF feature computes the entire sequence of data prep Steps across Paxata Projects, datasets and AnswerSets to produce an end-to-end, automated output Flow for your data. For customers who are currently using the 2018.2 Automation feature and are ready to upgrade their automated jobs to APF, contact customersuccess@paxata.com for assistance.


There are two types of workload automation that reduce the number of repetitive tasks taken to produce AnswerSets: Library automation and Project automation.

  • Library automation
    When you automate a Data Library dataset, you schedule it to automatically pull an update from its source based on a schedule you define. During the automation process, a dataset is updated with new versions of the data using the import and parse options specified when the file was originally uploaded into the Data Library. However, when you set up a dataset for automation, you have the option to modify those parse options.
    Note that you cannot automate datasets on your local system. 

  • Project automation
    When you schedule a Project for automation, you set it up to automatically publish an AnswerSet to the Data Library based on the schedule and parameters you define. The AnswerSet can also be exported to an external data source, for example AWS S3. 

    Note that Project Lenses are essential for Project Automation because they define the publishing points to use for your automated jobs. In order to automate a Project, you must have a Lens defined for each point in the Project where you want to publish data. You must have at least one Lens defined in your Project, otherwise no data can be published. For more information on Lenses, see the article for Project Lenses.

The purpose of this article is to provide:

After you configure automation schedules for Data Library files and/or Projects, both are collectively referred to as automation “jobs.” The Automation dashboard provides you with details of all automation schedules and the status of all automation jobs.


Overview of Data Library automation configuration page

To open the Data Library automation configuration page:

  1. Open the Data Library.
  2. Locate the file you want to automate.
  3. Click the "More Actions" button that displays, then select the "Automation" option.
  4. The configuration page opens:



    Job Name and Job Description
    The dataset's name and description are listed in these fields. These are initial default values from when the file was originally imported into the Data Library. They can be changed here by entering new information into the fields. 

    Important: you may also notice a check box option for "Set me as the owner of this automated schedule". This option only appears if you are not the person who initially set up this dataset for automation or not the person who currently owns its automated schedule. Ownership is significant because it provides a way to identify and audit users who are running automated jobs in the system. Typically, this option is used when automation responsibilities are transitioned to a new person in an organization. Important: if you take ownership of an automation job, you must have all of the permissions that are required to perform every operation performed by the automation.

    Schedules
    Any upcoming schedules for the dataset are displayed here. The "Add" button allows you to set up new schedules. The "Deactivate" link in this panel allows you to indefinitely suspend all scheduled jobs for this dataset until you return and click the "Reactivate" button. To set up schedules, see the section below: steps for setting up a Data Library dataset for automation.

    Notifications
    Email notifications can be sent to notify users of either a successful upload into the Data Library or errors that have occurred. To set up notifications, see the section below: steps for setting up a Data Library dataset for automation.

    Importing from
    These are the connection parameters inherited from the most recent upload of the dataset. To change these connection parameters, manually upload a new version of the file to the Data Library with new parameters. Automation will then use the new connection parameters in its next scheduled update.

    Import parsing options
    For file-based datasets, the import parse options are displayed below the connection details. The import options are inherited from the most recent version of the dataset but are editable here. Note: if you manually import another version of this dataset into the Data Library, the parse options you select for the manual upload will not be inherited from the automated version.

Steps to set up a Data Library dataset for automation

Set up schedules
Click "Add" to set up a new time for this dataset to be updated by automation. The default setting for dataset automation frequency is to "Repeat" on the time and day you specify here. The "Repeat" button is a toggle that you can click to switch the automation to run "Once" at the time you specify.

Steps for "Repeat" set up:

  1. Use the up and down arrows to adjust the time.
  2. Toggle the PM or AM button to select the correct period.
  3. Select the frequency: week, day or month. Note that "week" is the default. Click in the field to make a different selection.
  4. Depending on your frequency selection, specify the day of the week or date in the month.
  5. Click "Okay" to add the schedule. Your newly added schedule then appears. Click the pencil icon to edit it or the "x" button to delete it.

Note:

  • The time you select here is based on your current time zone.
  • The time, day or date you select here must be in the future. For example, if it is currently 1pm on Monday and you set up the automation to run at 10am every Monday, then automation will not run today for this file.
  • Datasets that are on your own local system cannot be automated.

Steps for "Once" set up:

  1. Click in the date field to open a calendar picker.
  2. Use the up and down arrows to adjust the time.
  3. Toggle the PM or AM button to select the correct period.
  4. Click "Okay" to add the schedule. Your newly added schedule then appears. Click the pencil icon to edit it or the "x" button to delete it.

Note:

  • The time you select for a schedule is based on your current time zone.
  • When configuring a file’s automation to run only "Once", do not set the job’s start time too near to the current time. Your local computer’s clock may not be precisely in sync with the web server that will process the job. If your local computer’s clock is running behind the web server’s clock, the time you specify for the job may have already passed on the web server. In this case, your job will not start.
  • If you want to simply test one automated run of this dataset, use the "Add to Queue" feature instead of setting it up to run "Once". For details on this feature, see the section below: Save your automation configuration settings
  • Datasets that are on your own local system cannot be automated.

Important considerations when setting up a dataset for automation:

  • If an automated Project uses this dataset for input, you must ensure a safe buffer of time for this dataset update to finish uploading in the Data Library before the automated run of the Project begins.
  • The time you specify here is when this job will be added to the queue for uploading and not necessarily the start time for the automated import.


Set up notifications

Email notifications can be sent to notify users of either a successful upload into the Data Library or errors that have occurred. An Error email provides a link to the file's log file where you can determine the cause of the error(s).

To set up notifications:

  1. Click the drop-down menu to select which type of email notification to send: "Errors" or "Success".
  2. Add the email address and press Enter.

Important considerations:

  • An email address can be added only once for each notification type.
  • Recipients must have the required system permissions to view the automation results.

Save your automation configuration settings
Click the "Save" button at the top of the configuration form to save all of your settings. After saving, notice the "Add to Queue" button that displays:

The button allows you to add this automation job to the queue of upcoming jobs that will be run the next time automation starts. This option is useful if you want to test out this automation configuration without having to wait for its scheduled run time. Note: the Automation panel provides details of when the next automated run is scheduled to start, and you can quickly navigate to the Schedules panel by clicking the "View Schedules Now" link that displays in the header after you click the "Add to Queue" button:
 

Overview of Project automation configuration page

To open the Project automation page, open your Project and click the automation status button:
 


The Automation configuration page opens:


Job Name , Job Description and Project
The job’s name and description are listed in these fields. This is the name and description provided when the Project was originally created. It can be changed here for the automated version of your Project by entering new information into the fields. 

The Project field provides a link that automatically opens the Project you are setting up for automation. This link is particularly useful if you have multiple Versions of a Project but are automating one specific Version of that Project in this configuration form.

Important: you may also notice a check box option for "Set me as owner of this automated schedule". This option only appears if you are not the person who initially set up this Project for automation or not the person who currently owns its automated schedule. Ownership is significant because it provides a way to identify and audit users who are running automated jobs in the system. Typically, this option is used when automation responsibilities are transitioned to a new person in an organization. Important: if you take ownership of an automation job, you must have all of the permissions that are required to perform every operation performed by the automation.


Import Datasets
The datasets you have already imported into your Project are listed here. If no datasets are listed, verify that you have saved the most recent set of changes to your Project Steps.

Your "Base" dataset is listed and any "Lookup" or "Append" datasets for the Project are listed above it.

If this Project uses a dataset that is set up for automation in the Data Library, the schedule is displayed here for your reference. When using an automated dataset, you must consider its automation schedule and allow a safe buffer of time for the new version to be published in the Data Library.

To set up a dataset for automation before you configure automation for this Project, click the "Set it up now?" link adjacent to the dataset’s name. You are taken to the Data Library scheduling page where you set up the automation parameters and schedule. These steps are covered above in Steps to set up a Data Library dataset for automation

Note: the "Use Latest Version" default setting refers to the version of the dataset that is used for this automation configuration. See the section Steps to set up a Project for automation (below) for details on which version you should select for automating this Project.

Schedules
Any upcoming schedules for the Project are displayed here. The "Add" button allows you to set up new schedules. The "Deactivate" link in this panel allows you to indefinitely suspend all scheduled jobs for this dataset until you return and click the "Reactivate" button. To set up schedules, see the section below: Steps to set up a Project for automation


Notifications
Emails can be sent to notify users when automated Projects finish updating or have errors. To set up notifications, see the section below: Steps to set up a Project for automation.


Publish AnswerSets
Select a Lens for publishing an AnswerSet.
A Lens is pinned to a Step in your Project and creates a publishing point that can be used by automation to publish an AnswerSet. You can save the set up for this Project’s automation without selecting a Lens, but an automated run of this Project will not succeed until you select a Lens. 

Automated Projects are automatically published to the Data Library. However, automation can also be configured to export the published output to an external data source. See the section below for details: Steps to set up a Project for automation


Steps to set up a Project for automation

Import Datasets
For each dataset used in your Project, choose to use the "Latest Version" or "Current Version" for input:
  • "Latest" will use the most up-to-date version of the dataset in the Data Library when this automated job is run. Note: this will result in a new Version of your Project each time this automated configuration runs. When selecting the latest version, an additional option is available to specify if an automated run should fail because the latest version of the dataset has a different layout (schema)—for example new columns added, removed columns that are not used in the Project's Steps, different column types for existing columns, new order, etc. 
    Note: at least one of the datasets used for automating this Project must be "Latest Version". Otherwise, if no changes occur in the input datasets after an automated run of your Project, the platform will not re-run this job.

  • "Current" pins the dataset, in its current state, for all future automated runs. Using "Current" may be useful when a static dataset serves as a reference table for your Project.
Important: if there have been no changes to the input datasets since the last Project automation run, then automation will not run again for the Project until there are changes to the Project. Therefore, at least one of the datasets used for automating this Project must be "Latest Version".


Set up schedules

Click "Add" to set up a new time for this Project to run. The default setting for Project automation frequency is to "Repeat" on the time and day you specify here. The "Repeat" button is a toggle that you can click to switch the automation to run "Once" at the time you specify.

Steps for "Repeat" set up:

  1. Use the up and down arrows or enter values in the fields to adjust the time.
  2. Toggle the PM or AM button to select the correct period.
  3. Select the frequency: "week", "day" or "month". Note that "week" is the default. Click in the field to make a different selection.
  4. Depending on your frequency selection, specify the day of the week or date in the month.
  5. Click "Okay" to add the schedule.
  6. Your newly added schedule displays. Click the pencil icon to edit it or the "x" button to delete it.

Note:

  • The time you select here is based on your current time zone.
  • The time, day or date you select here must be in the future. For example, if it is currently 1 p.m. on Monday and you set up the automation to run at 10am every Monday, then automation will not run today for this file.

Steps for "Once" set up:

  1. Click in the date field to open a calendar picker.
  2. Use the up and down arrows to adjust the time.
  3. Toggle the PM or AM button to select the correct period.
  4. Click "Okay" to add the schedule.
  5. Your newly added schedule displays. Click the pencil icon to edit it or the "x" button to delete it.

Note:

  • The time you select here is based on your current time zone.
  • When configuring a Project’s automation to run only "Once", do not set the job’s start time too near to the current time. Your local computer’s clock may not be precisely in sync with the web server that will process the job. If your local computer’s clock is running behind the web server’s clock, the time you specify for the job may have already passed on the web server. In this case, your job will not start.
  • If you want to simply test one automated run of this Project, use the "Add to Queue" feature instead of setting it up to run "Once". For details, see the section below: Save your Project automation configuration settings. 

Important considerations when setting up a Project for automation:

  • If a Project’s automation depends on input from an automated Data Library file or an AnswerSet published from another automated Project, ensure a safe buffer of time for all input updates to finish before the automated run of the Project begins.
  • The time you specify in the automation set up is when this Project will be added to the queue for publishing an AnswerSet, and not necessarily the publishing start time.


Set up notifications

Emails can be sent to notify users when automated Projects finish updating or have errors. An Error email provides a link to the Project's log file where you can determine the cause of the error(s).

To set up notifications:

  1. Click the drop-down menu to select which type of email notification to send: "Errors" or "Success".
  2. Add the email address and press Enter.
Important considerations:
  • An email address can be added only once for each notification type.
  • Recipients must have the required system permissions to view the automation results.


Select Lenses
 and publish destinations
To add a lens:

  1. Click the green Add button:
     

  2. A lens from the Project is added to this automation configuration. By default, the lens that occurs earliest in your Project Steps is selected. To change the default selection, click the drop-down to select a different lens that currently exists in your Project. Otherwise, to add additional lenses to be used for this automated run of your Project, click the Add button and continue to select lenses.

    To disable a Lens, click the green "On" button for the Lens to toggle it off; to remove a Lens, click the "x" button for the Lens. If you need to add a new lens for automating this Project, you will need to open the Project and add the lens on the desired step. For more help on adding lenses to your Project, see Project Lenses.

  3. The default location for publishing this Project when automation runs is "Library Only". If you want to export the published output to an external data source, in addition to publishing it to the Data Library, click the drop-down and select "Library & Data Source":

    • Name: the name that will be used for automated versions of this Project.
    • Data Source Name: click the drop-down to select an available data source. Note: only the data sources that have been configured for export, and that you have permissions to access, are displayed in the drop-down selection for Data Source Name. Contact your System Administrator if you don't see the data source you want to select for export.
    • Directory Path or Database Name: provide the path or database on the data source where the export will be written.
    • Format: depending on the data source you select for export, the option to select a file format is also available. Any applicable parsing options are also presented.
    • Credentials: the user credentials for writing to the selected data source are presented here. You can edit the credentials here.
    • When the "Create unique name" option is enabled, automation appends an underscore and time stamp to the file or table name for each successive automated export so that any previous exports of this Project are not overwritten on the data source. Important: if you enable this option for a JDBC data source, ensure that your system administrator has also enabled the "Automatically Create Table" option in the JDBC Connector form. Otherwise, automation for this Project will fail.

Save your Project automation configuration settings

Click the "Save" button in the upper right pane to save all configurations you have made for automating this Project. After saving the automation schedule, notice the "Add to Queue" button that displays:

This button allows you to add this automation job to the queue of upcoming jobs that will be run the next time automation starts. This option is useful if you want to test out this automated configuration without having to wait for its scheduled run time. Note: the Automation panel provides details of when the next automated run is scheduled to start and you can quickly navigate to the Schedules panel by clicking the "View Schedules Now" link that displays in the header after you click the "Add to Queue" button.



Automation Dashboard


Overview

The Automation dashboard provides details and history for all Data Library files and Projects that are set up to be automated. This is where you:

  • view your automation usage details.
  • view and manage the schedules for automated jobs.
  • view job execution history and statuses.
  • re-run failed jobs.

The dashboard is organized by Schedules and Job Details.

Schedules

The Schedules page displays a list of all Data Library files and Projects that are currently configured for automation. To view your automation usage details, mouse over the meters for additional information regarding the number of automated jobs you’ve already completed and the maximum number you can run for the day, week or month.

The Schedules page can also be filtered in a variety of ways to display:

  • "Active" jobs or "Inactive" jobs that have had their automation schedules deactivated.
  • types of jobs—Project only or Library only.
  • automation jobs that you own.
  • job states—"Success", "Complete with Error", "Error" or "Over Limit". See Definition of Job States below for the meaning of each state.
  • jobs that last finished during a date range that you specify or that will be run in the next automated run based on the range you provide.

You can re-run any job by clicking the "Add to Queue" link. This creates an internal schedule on-the-fly for the job, which triggers automation to run the job the next time the automation service wakes up to run regularly scheduled jobs. It's important to keep in mind that system resources must be available in order to run a queued job. For example, the number of threads allocated to run automation jobs must be sufficient. Otherwise, the job will remain in a queued state until resources become available to run it.

Note:
  • to determine the errors before re-running a job, go to the "Job Details" tab and open the "Results" page for that job run.
  • "Add to Queue" for re-running a job with errors does not count against your existing automation guardrail limits.

To make changes to a job's configuration settings, including deactivating it, click the job's name to open its configuration page. Then make and save the configuration change(s). 

Job Details

The Jobs Details page provides an audit trail for every executed automated run—including automated jobs that have been deleted. To view your automation usage details, mouse over the meters for additional information regarding the number of automated jobs you’ve already completed and the maximum number you can run for the day, week or month.

The Job Details page can also be filtered in a variety of ways to display:

  • types of jobs—Project only or Library only.
  • job states—"Success", "Complete with Error", "Error", "Over Limit", "Queued", "Running". See Definition of Job States below for the meaning of each state.
  • jobs that last started or last finished during a date range that you specify.

To display granular details for a job run, click the row for that job. The "Results" page for the job opens and displays a snapshot of the configuration settings used for this instance of the job run:

Note: because this is a snapshot, the job settings may have changed since this automated run.


If this a Project job, click the "View Lens" link to open the Project to the Lens that was used to publish the AnswerSet for this run. Click the "View AnswerSet" link to view the AnswerSet published by this run:

 


If this is a Library job, click the "View Dataset" link to open the file in the Data Library:

If any errors occurred during the run, they are displayed here. Download the log file for this job by clicking the "Download log" link:


Definition of Job States

There are six possible states for an automated job.

  • Running: job run is currently in progress.

  • Success: job successfully finished with no errors.

  • Error: job run failed.

  • Completed with errors: job run completed, but there were errors that prevented a complete run—for example, a job that successfully published to the Data Library but was unable to export to the specified Data Source will complete with this type of error.

  • Queued: when a job is queued an internal schedule is created on-the-fly for it, which triggers automation to run the job the next time the automation service wakes up to run regularly scheduled jobs. However, it's important to keep in mind that system resources must be available in order to run a queued job. For example, the number of threads allocated to run automation jobs must be sufficient. Otherwise, the job will remain in a queued state until resources become available to run it.

  • Over limit: when a job run exceeds the daily, weekly or monthly guardrail limits, then the job fails with an "Over limit" error. Important notes:
    • automation guardrails are enforced at the tenant level.
    • the weekly automation limit is defined as 00:00 Monday—23:59 Sunday.
    • a job that ends in error is counted toward your limits, but a retry of the failed job (through "Add to queue") is not counted.