|Your Paxata Administrator must enable this feature in your application.|
Automatic Project Flows, or APF, allows you to intelligently operationalize curated data flows. With a single click, APF computes the entire sequence of data prep Steps across Paxata Projects, datasets and AnswerSets to produce an end-to-end, automated output Flow for your data. You can set the Flow to run on a recurring time-based schedule, or run it just once to produce an end-result AnswerSet. All runs can then be easily managed through the Monitoring Interface.
How does APF help you with your data prep work?
Business Analysts and Data Engineers can simplify complex data flows by breaking them into smaller groups of Paxata Projects that can be operationalized—with each Project focused on performing a related or cohesive set of Steps for improved readability and limited complexity. When you're finished creating your Projects, simply select the final Project in the sequence as your "target" Project. APF takes care of the rest—sequencing, preparing and automating the entire end-to-end flow without any manual stitching required.
How does APF help your team with its data prep work?
Your team that requires input from both Business and the IT Leader can simplify the data prep process when members build Paxata Projects that depend on output AnswerSets created by others. Everyone completes their data prep work in their own Paxata Project, and then the entire sequence is operationalized from a single "target" Project. APF takes care of the rest with no manual stitching required, regardless of who created or owns the Projects and AnswerSets. Members of the team can then use the APF Monitoring Interface to view how their Projects and AnswerSets participate in the Flow's final output.
Example of APF in action
In the example illustrated above, the end-state "Sales Variance Report" is produced from a series of Paxata Projects and AnswerSets produced by multiple people. Bob connects to the data lake for his "Product Hierarchy" data, preps and produces an AnswerSet that is shared with Susan who, additionally, pulls in "Sales Transaction history" data from a Cloud application. She preps all of this data and produces an AnswerSet, which is then shared with you for the Sales Variance Project that you maintain. In addition to the AnswerSet from Susan, you also need to combine data from an Excel report that you pull in from a cloud storage system. When you're finished with your data prep, you then produce a "Sales Variance Report" AnswerSet. Because you need to produce this report each week, the APF feature makes your data prep work a breeze. You simply click the "Create Project Flow" button in your Sales Variance Project, configure a time-based trigger for running the Flow, and APF takes care of the rest by intelligently traversing back through the Flow of related Projects, AnswerSets and datasets to create the dependency chain required to produce your end-state AnswerSet. You can then use the APF Monitoring Interface to manage all subsequent runs of your Flow.
A few important things to note about APF:
Creating a Project Flow is as simple as opening your "target" Project—the Project that will produce your end-state AnswerSet—and clicking the "Create Project Flow" button in the top right-hand corner of that Project.
Note: APF is a feature that must be enabled. If you do not see this button in your Project, contact your Paxata System Administrator.
You are prompted to provide a name and optional description for the Flow. By default, your target Project's name is used for the Flow. But you can change it here. Then click the Create button.
The intelligent automation engine then calculates all of the Flow dependencies for you and presents the Configuration Interface where you set the triggers and notifications, and tweak any settings for the Flow's input and output datasets. See the next section for an explanation of the Configuration Interface.
The Configuration Interface has three tabs where you configure the settings for your Flow. The Configuration Interface is presented when you first create a Flow and also opened when you choose to "edit" any of the saved Flows displayed in the Monitoring Interface. Three configuration tabs are used to configure your Flow. The three tabs are described below. Note that buttons for "Graph", "Actions" and "Discard Changes" and "Save" are always present in the Configuration Interface and provide common actions you can take for all Flows.
Note: as soon as a Flow is created, a Project ID Flow will also display on the General tab. This ID is used to identify the Flow for REST API calls and also for any required troubleshooting of the Flow.
The Inputs tab provides a list of all the datasets used to in the Flow, the versions of those datasets are that used to create the Flow, and the Projects in which each dataset is used. There are three actions you can take on this tab:
Note: you can easily determine metadata statistics for the dataset inputs by hovering your mouse over a dataset name in the DATASETS column. The dataset's version, creation date and user who added it to the Library, and the number of columns and rows are displayed in a pop-up window.
The Outputs tab provides a list of all the output AnswerSets that are published from the Flow. Because a publishing lens is always required to create a publishing point from a Paxata Project, all of the outputs are configured at the lens level. There are times when your Flow may include a Project that has multiple lenses, but not all of those lenses are required to produce output AnswerSets required for the Flow. By default, only required lenses automatically publish AnswerSets that are saved in the Library. However, if you'd like to enable the publish for AnswerSets that are not required for the Flow, then you can enable them here.
Note: lenses that produce output AnswerSets required for the Flow can never be disabled.
In addition to adjusting the publish options for non-essential AnswerSets, you may choose to publish any lens output AnswerSet to an external data source, for example a database, cloud storage system, etc. To specify a publish location in addition to the Paxata Library, click the Configure Lens button to open the Exports panel.
There are two actions you can take on this tab:
The Monitoring Interface is used to monitor the status of all Flows. The interface is organized by Snapshots, Runs, and Chores because these are the key components for generating a Flow's output.
The following diagram illustrates how to locate the monitoring information you need for a Flow. Each of these pages are explained below the diagram.
1. Project Flows page lists all of the Flows that you have permissions to view and edit, and the current status of the most recent run for each (succeeded, failed, etc.) There are three actions you can take from this page:
In addition, the More Actions option on the page allows you to:
2. Snapshots page lists all of the Snapshots for a Flow. Every time a Flow is executed, which is called a "run" of the Flow, a Snapshot is created to capture the configuration settings used to create the output for the run. The runs will continue with this Snapshot until any configuration changes are made to the Flow—for example changes to the schedule, notifications, inputs, output settings, etc. Then a new Snapshot is created for the Flow and the new Snapshot captures all of the executed runs with the modified configuration settings.
Snapshots provide clear audit-ability of the exact state of a Project Flow for each run.
Important: a new Snapshot is not created if datasets are configured to use the latest version from the Library. See the Inputs section for dataset configuration options.
There are two actionable items on this page:
3. Run List page captures all details for each individual run under a Snapshot. The number of discreet chores that must be completed in order to finish the run—for example, publishing a dependency AnswerSet—are listed on the page. So every time a Flow is run, a new run entry displays on this page. Important: if there is no change to the data used to create the Flow, for example all of the datasets used in the Flow remain exactly the same version as were used in the previous run, then the APF engine will conserve resources and not re-run the Flow again until new data inputs are available. There is one actionable item on this page:
Important: the APF quotas meter displays at the top of the Flows page to indicate your usage. When hovering over any one of the counts for Daily, Weekly or Monthly, a tooltip displays to provide details of your current usage and limit.
Note that quotas are based on "chore" count, and chores are defined as:
The sum of all chores ultimately produces the output for your Flow. While a Flow is in the process of running, you will need to refresh your browser to update the quotas meter on the Flow's page. If you need your quotas for chore count increased, please contact your Paxata Administrator or Paxata Customer Success.
The following actions can be taken for all saved Flows:
Generate a visual graph for Flow
The Graph button generates an APF graph in a new browser window that displays the datasets and how they flow into the individual Projects used to generate a Flow's final output AnswerSet.
Hovering over any dataset or Project in the Flow also displays the corresponding downstream lineage (in pink) and upstream dependencies (in blue).
For example, when hovering over the dataset for March 2016 Transactions:
When hovering over an intermediate Project in the Flow—in this example Customer Loyalty-Women Members—the upstream dependencies display through the blue lines while the downstream lineage displays through the pink lines.
Notice in both examples that if datasets and Projects do not participate in the portion of the Flow that you've selected, then they are grayed out in the graph.
Note that you may see a dotted line in a graph for some Flows. The dotted line indicates that an AnswerSet was published from a Project in the Flow, and then later consumed again by the same or another Project in the Flow. This is referred to as a looping input and is represented by the dotted line.
Run a Flow manually
There may be times when you want to manually kick off a run of a Flow without having to wait for its scheduled start time. This can be done from the "Actions" drop-down. Select "Run now" and the Flow will be prepared for a run.
Delete a Flow
If you no longer want to keep a saved Flow, you can delete it. This can be done from the "Actions" drop-down. Select "Delete" and you are promoted to confirm your selection. Note that any AnswerSets, that were published to the Library as a result of running this Flow, will not be deleted as a result of deleting the Flow.
Update a Flow to use latest the latest Project Versions
Every time an action is taken in your Project—for example adding a Step, removing a Step, re-arranging Steps—a new version of your Project is created. Each version provides an audit trail of the changes you have made to your data during the course of your data prep work. When creating a Project Flow, the Flow is always pinned to the specific Project versions at the time of the Flow's creation. However, you can update a Flow to use the latest version of all Projects. This can be done from the "Actions" drop-down while on the Outputs tab. Select "Update Projects" and you are prompted to confirm your selection.
Note there are conditions that apply to updating Project versions, and Project versions cannot be updated if any Project in your Flow:
If you want to update on a specific Project's version—instead of all Projects in the Flow—this can be done from the Outputs tab: mouse-over the Project for which you want to update the version, then click the blue "Update Project Version" button that displays in the right-hand column.
|Chore||A chore is a dataset import or a Project execution. The dataset import chore performs a re-import of your dataset through a data source. The Project execution chore addresses all other tasks required for the Flow, such as publishing an AnswerSet to the Library, export of an AnswerSet, etc.|
|Flow||A collection of Projects that can be run as a unit. One or more frequency-based schedules can be associated with a Flow, which allows a Flow to run on a recurring basis.|
|Inputs||Datasets from the Library that are required to run a Flow.|
|Outputs||The AnswerSets written to the Library generated by the run of a Flow.|
|Run||The execution of each of the Projects that are required by the Target Project. The run executes all of the Steps from the upstream dependency Projects and then writes the resulting AnswerSet(s) to the Library.|
|Snapshot||Your Paxata Administrator must enable this feature in your application.|
|Target Project||The Paxata Project from which a Flow is created. Once a Flow is created, all upstream dependencies are automatically calculated by the APF engine.|