Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Example
  • Parameters

Was this helpful?

Edit on GitHub
  1. Data Pipeline Operations
  2. Orchestrate processings with Workflow

[V2] Workflow configuration file

This is the description of the JSON configuration file for a V2 Workflow data operation.

PreviousOrchestrate processings with WorkflowNext[V1: deprecated] Workflow configuration file

Last updated 1 year ago

Was this helpful?

A Workflow data operation is used to specify one or several data operation IDs that need to be successfully executed for one target data operation to be triggered.

Example

{
    "$schema": "http://jsonschema.tailer.ai/schema/workflow-veditor",
    "configuration_type": "workflow",
    "configuration_id": "workflow-Load_Configuration",
    "version":"2",
    "environment": "DEV",
    "short_description": "This is a short description",
    "doc_md": "readme.md",
    "account": "000099",
    "activated": true,
    "archived": false,
    "gcp_project_id": "fd-io-tailer-demo-dlk",
    "schedule_interval_reset": "59 23 * * *",
    "authorized_job_ids": [
        "gbq-to-gbq|000099_assert_of_configuration_DEV"
    ],
    "target_dag": {
        "configuration_type":"table-to-table",
        "configuration_id":"Load_Configuration_DEV"
    }
}
Parameter
Description

$schema

type: string

optional

The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues.

configuration_type

type: string

mandatory

Type of configuration file. In this case, the value has to be "workflow".

configuration_id

type: string

mandatory

ID of the workflow.

You can pick any name you want, but is has to be unique for this type of configuration file.

Note that in case of conflict, the newly deployed data operation will overwrite the previous one.

version

type: string

mandatory

Version of the configuration. Must be "2" in order to use the latest features.

Default : "1" for backward compatibility purposes but only version "2" supports the latest features. Version 1 is deprecated.

environment

type: string

mandatory

Deployment context.

Values: PROD, PREPROD, STAGING, DEV.

short_description

type: string

optional

Short description of the context of the configuration file.

doc_md

type: string

optional

Path to a file containing a detailed description of the data operation. The file must be in Markdown format.

account

type: string

mandatory

Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.

activated

type: boolean

optional

Flag used to enable/disable the execution of the workflow.

If not specified, the default value will be "true".

archived

type: boolean

optional

Flag used to enable/disable the visibility of the workflow's configuration and runs in Tailer Studio.

If not specified, the default value will be "false".

gcp_project_id

type: string

mandatory

ID of the Google Cloud project containing the BigQuery instance to be triggered.

schedule_interval_reset type: string

optional

Example:

For a daily job, you may want to reset the workflow everyday at 23:59, so the runs on previous days won't be taken into account for the current day. You need to set it as follows:

"schedule_interval_reset": "59 23 * * *"

Default: None

authorized_job_ids

type: string array

mandatory

Data operations that need to be executed and successful for the current workflow to be triggered. It can be retrieved from the Runs on Tailer Studio, in the Run Details > Job Id. It is possible to wait for several configurations which will be separated by a comma. Example : "gbq-to-gbq

target_dag

type: dict

mandatory

Data operation to trigger. Target_dag is split in two key-value.

configuration_type (string) : you specify the type of data operation with one of the following values : - "storage-to-storage" - "storage-to-table" - "table-to-table" - "table-to-storage" - "vm-launcher" configuration_id (string) : you need to specify the data operation configuration_id, as it is specified in the last part of your job_id. If you deployed your target job without any context, then it's the configuration_id concatenated with the environment (for ex: yourConfId_PROD). If you deployed your target job with a context, then it's the context id, starting with the account id, concatenated with the configuration id (for ex: 000099_yourContextId_yourConfId).

Parameters

You can choose to reset your workflow regularly by specifying here a Cron expression (see for ex.). When you reset a workflow, all the triggering conditions are set to false, so all the previous runs are forgotten.

👁️‍🗨️
⚙️
crontab.guru