Tables to Tables configuration file
This is the description of the JSON configuration file of a Tables to Tables data operation.
The configuration file is in JSON format. It contains the following sections:
Global parameters: General information about the data operation.
Workflow task parameters: Information about the different tasks of the workflow.
👁️🗨️ Example
Here is an example of TTT configuration file:
🌐 Global parameters
General information about the data operation.
Parameter | Description |
---|---|
$schema type: string optional | The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues. |
configuration_type type: string mandatory | Type of data operation. For a TTT data operation, the value is always "table-to-table". |
configuration_id type: string mandatory | ID of the data operation. You can pick any name you want, but is has to be unique for this data operation type. Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to name your data operation by concatenating:
|
short_description type: string optional | Short description of the context of the data operation. |
doc_md type: string optional | Path to a file containing a detailed description of the data operation. The file must be in Markdown format. |
environment type: string mandatory | Deployment context. Values: PROD, PREPROD, STAGING, DEV. |
account type: string mandatory | Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator. |
activated type: boolean optional | Flag used to enable/disable the execution of the data operation. Default value: true |
archived type: boolean optional | Flag used to enable/disable the visibility of the data operation's configuration and runs in Jarvis Studio. Default value: false |
start_date type: string mandatory | Start date of the data operation. The format must be: "YYYY, MM, DD" Where:
|
schedule_interval type: string mandatory | A Tables to Tables data operation can be launched in two different ways:
Example For the data operation to start everyday at 7:00, you need to set it as follows:
You can find online tools to help you edit your Cron expression (for example, crontab.guru). |
max_active_runs type: integer optional | This parameter limits the number of concurrent runs for this data operation. As most data operations are run once daily, there is usually no need to set a value higher than 1 here. Default value: 1 |
task_concurrency type: integer optional | This parameter limits the number of tasks that might run concurrently. As a great volume of data might be handled by each task, it is important to make sure to avoid consuming too many resources for one data operation. Make sure also that the value you set here is high enough for concurrent tasks set in the task_dependencies parameter to run properly. Default value: 5 |
catchup type: boolean optional | This parameter allows you to specify if you want to execute the data operation runs that were supposed to happen between start_date and the actual deployment date. Example If you start receiving data from September 1st, but you only finish writing your code on September 7th, you might want to run your data operation from a date in the past: September 1st. The catchup parameter can have two values:
Default value: false |
task_dependencies type: array of strings mandatory | The task_dependencies parameter allows you to create dependencies between the different tasks specified in the workflow parameter (see below). It will define in which order the workflow tasks will run, some of them running concurrently, others sequentially. Usually workflow tasks will be run in the following order:
Syntax
For detailed information about the syntax, refer to the Airflow documentation. Example 1 We have the following tasks that we want to run sequentially: taskA (create_gbq_table), taskB (sql) and taskC (copy_gbq_table).
The task_dependencies parameter will be as follows: Example 2 We have the following tasks that we want to run concurrently: taskA, taskB and taskC. The task_dependencies parameter will be as follows: Example 3 We have the following 9 tasks we want to order: taskA, taskD, taskG (create_gbq_table), taskB, taskE, taskH (sql), taskC, taskF, taskI (copy_gbq_table).
The task_dependencies parameter will be as follows: Example 4 In the example above, we want taskH to run before taskE so we can use its result for taskE. The task_dependencies parameter will be as follows:
|
workflow type: array of maps mandatory | List of tasks the data operations will execute. Check the section below for detailed information on their parameters. |
default_gcp_project_id type: string mandatory | Default GCP Project ID. This parameter can be set for each workflow task sub-object, and will be overridden by that value if it is different. |
default_bq_dataset type: string mandatory | Default BigQuery dataset ID. This parameter can be set for each workflow task sub-object, and will be overridden by that value if it is different. |
default_bq_data_location type: string optional | The default BigQuery data location. Default value: EU |
default_write_disposition type: string mandatory | Action that occurs if the destination table already exists (see Google BigQuery documentation). Possible values:
This parameter can be set for each workflow task sub-object, and will be overridden by that value if it is different. |
➿ Workflow task parameters
A Tables to Tables workflow can include four types of tasks:
create_gbq_table: This type of task allows you to create the skeleton of a table based on a DDL file. You could compare it to the baking pan of your table.
sql: This type of task allows you to fill that baking pan using a SQL query or a SQL script from a file.
copy_gbq_table: This type of task allows you to duplicate a table named X into another table named Y.
expectation: This type of task allows you to perform data quality tests. See expectations for more details.
delete_gbq_table: This type of task allows you to delete a table or a partition of a table.
For each workflow sub-object, parameters will differ depending on the task type.
Refer to this page to know how to create the DDL and SQL files corresponding to these tasks.
SQL task parameters
Parameter | Description |
---|---|
task_type type: string optional | The value has to be set to "sql" for this task type. As "sql" is the default type, this parameter can be omitted for this task type. |
id type: string mandatory | ID of the task. It must be unique within the data operation. |
short_description type: string optional | Short description of what the task does. |
doc_md type: string optional | Path to a file containing a detailed description of the task. The file must be in Markdown format. |
sql_file type: string mandatory | Path to the file containing the actual SQL query or the SQL script (can be assertions or expectations). This file is going to be read and its content uploaded to Firestore upon deployment of the data operation. |
gcp_project_id type: string optional | GCP Project ID. Overrides default_gcp_project_id. |
bq_dataset type: string optional | Name of the BigQuery destination dataset. Overrides default_bq_dataset. |
write_disposition type: string optional | Action that occurs if the destination table already exists (see Google BigQuery documentation). Possible values:
Overrides default_write_disposition. |
table_name type: string mandatory if the SQL file contains a query, optional for a SQL script | Target table used upon SQL query execution.
This parameter is optional if the SQL file contains a script, i.e. one or more statements separated by a Note that you can use the FD_DATE templates. Example: my_table${{FD_DATE}} will be rendered with the TTT logical date => my_table$20240223 The available templates are: |
criticality type: string optional | Task criticality which defines the impact of an error on this task on the data operation global status and behavior. It allows to create tasks dedicated to warning tests (see expectations) that won't break the execution if it fails or conditional tasks that can lead to different tasks depending on predefined conditions (see this article). Criticality can be:
Default value: "critical" |
temporary_table type: boolean optional | If set to "true", this parameter will force the deletion of the table at the end of the data operation. If the TTT data operation failed before deleting the temporary table, the table will be deleted automatically after 24 hours. Default value: false |
sql_query_template type: string optional | If you want to use variables in your SQL query or script, you need to set this parameter to "TEMPLATE_CURRENT_DATE" (only supported value). This variable will be set to the execution date of the data operation (and not today's date). For example, if you want to retrieve data corresponding to the execution date, you can use the following instruction:
|
Table Creation task parameters
Parameter | Description |
task_type type: string mandatory | In this case, the value has to be set to "create_gbq_table". |
id type: string mandatory | ID of the task. It must be unique within the data operation. |
short_description type: string optional | Short description of what the task does. |
doc_md type: string optional | Path to a file containing a detailed description of the task. The file must be in Markdown format. |
gcp_project_id type: string optional | GCP Project ID. Overrides default_gcp_project_id. |
bq_dataset type: string optional | Name of the BigQuery destination dataset. Overrides default_bq_dataset. |
bq_table type: string mandatory | Name of the BigQuery table name. |
ddl_file type: string mandatory | Path to the JSON file containing DDL information to create the table. |
force_delete type: boolean optional | If set to "true", this parameter will force the deletion of the table prior to its creation. |
criticality type: string optional | Task criticality which defines the impact of an error on this task on the data operation global status and behavior. Criticality can be: critical, warning, transparent or break. See "criticality" description in the SQL task above for more details Default value: "critical" |
Table copy task parameters
Parameter | Description |
task_type type: string mandatory | In this case, the value has to be set to "copy_gbq_table". |
id type: string mandatory | ID of the task. It must be unique within the data operation. |
short_description type: string optional | Short description of what the task does. |
doc_md type: string optional | Path to a file containing a detailed description. The file must be in Markdown format. |
source_gcp_project_id type: string mandatory | GCP project ID for the source BigQuery table. |
source_bq_dataset type: string mandatory | BigQuery dataset for the source table. |
source_bq_table type: string mandatory | Name of the source table. |
gcp_project_id type: string optional | GCP project ID for the destination BigQuery table. Overrides default_gcp_project_id. |
bq_dataset type: string optional | Name of the BigQuery destination dataset. Overrides default_bq_dataset. |
destination_bq_table type: string mandatory | Name of the BigQuery destination table. |
destination_bq_table_date_suffix type: boolean optional | Enables/Disables the date suffix to allow date partitioning in BigQuery. default: false |
destination_bq_table_date_suffix_format type: string mandatory if destination_bq_table_date_suffix is set to true | Date format for the table suffix. As of now, the only possible value is "%Y%m%d". |
criticality type: string optional | Task criticality which defines the impact of an error on this task on the data operation global status and behavior. Criticality can be: critical, warning, transparent or break. See "criticality" description in the SQL task above for more details Default value: "critical" |
Table Deletion task parameters
Parameter | Description |
task_type type: string mandatory | In this case, the value has to be set to "delete_gbq_table". |
id type: string mandatory | ID of the task. It must be unique within the data operation. |
short_description type: string optional | Short description of what the task does. |
doc_md type: string optional | Path to a file containing a detailed description. The file must be in Markdown format. |
bq_dataset type: string mandatory | BigQuery dataset for the table. |
bq_table type: string mandatory | Name of the table. Note that you can use the FD_DATE templates. Example: my_table${{FD_DATE}} will be rendered with the TTT logical date => my_table$20240223 The available templates are: |
default_bq_data_location type: string optional | The default BigQuery data location. Default value: EU |
criticality type: string optional | Task criticality which defines the impact of an error on this task on the data operation global status and behavior. Criticality can be: critical, warning, transparent or break. See "criticality" description in the SQL task above for more details Default value: "critical" |
Last updated