Table to Table SQL and DDL files
Learn how to create the SQL and DDL files corresponding to the workflow tasks of a Table to Table data operation.
πΊοΈ Overview
A SQL workflow is a sequence of tasks that feed tables in parallel or sequentially.
A workflow can be composed of the following task types:
SQL task ("sql"): instructions to load, merge and reorganize data.
Table creation task ("create_gbq_table"): instructions to create a destination table.
Table copy task ("copy_gbq_table"): instructions to duplicate a table (provided in the data operation configuration file).
π’οΈ SQL tasks
SQL tasks are steps of the workflow. Each SQL task is defined with a .sql file that contains the query. You can write the queries directly in the query editor of BigQuery and then save them to .sql files.
The file can contain a SQL query or a SQL script (like assertions or expectations).
The name of the SQL file should be the same as the SQL task.
Example:
π Table creation tasks
Once the SQL queries are ready, you need to use one or several DDL files to create the destination BigQuery tables that will contain the output data.
DDL Example
DDL Parameters
bq_table_description
type: string
mandatory
Description of the BigQuery table.
bq_table_schema
type: array
mandatory
BigQuery table schema. It contains a list of fields corresponding to the number of columns it will contain.
Each field described has three attributes:
name
description
bq_table_clustering_fields
type: array
optional
List of fields used when clustering is enabled.
The table data will be automatically organized based on the contents of the fields you specify. Their order determines the sort order of the data.
If this parameter is set, time partitioning will be automatically enabled on the table. If you don't set partitioning parameters, default values will be used.
bq_table_timepartitioning_field
type: string
optional
If this parameter is set, the table will be partitioned by this field.
If not, the table will be partitioned by pseudo column _PARTITIONTIME.
The field must be a top-level TIMESTAMP or DATE field. Its mode must be NULLABLE or REQUIRED.
(Refer to BigQuery documentation for more information.)
Note: You can set this parameter to a field that equals to DATE(''). Then, if you relaunch an execution with a partition, and if default_write_disposition is set to "WRITE_APPEND" in the JSON configuration file, Tailer will check if the corresponding partition already exists in the table:
If it does, it will delete it, and replace it with the current execution data.
If not, it will add them.
bq_table_timepartitioning_type type: string
optional
Sets the partition type. Use one of the following values: "HOUR", "DAY", "MONTH" or "YEAR".
If not present, default is "DAY".
bq_table_timepartitioning_expiration_ms
type: integer
optional
Number of milliseconds for which to keep the storage for a partition.
(Refer to BigQuery documentation for more information.)
bq_table_timepartitioning_require_partition_filter
type: boolean
optional
If set to true, queries over the partitioned table require a partition filter that can be used for partition elimination to be specified.
(Refer to BigQuery documentation for more information.)
DDL Data Types
Tailer Platform supports the following data types.
Numeric types
int64
Integers are numeric values that do not have fractional components.
They range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.
float64
Floating point values are approximate numeric values with fractional components.
numeric
This data type represents decimal values with 38 decimal digits of precision and 9 decimal digits of scale. (Precision is the number of digits that the number contains. Scale is how many of these digits appear after the decimal point.)
It is particularly useful for financial calculations.
Boolean type
boolean
This data type supports the true
, false
, and null
values. It can perform some basic conversions, such as 'true'
, 'True'
, True
, or 1 becoming true
.
String type
string
Variable-length character data.
When converting data from string to a different data type, makes sure to use safe_cast
when you're unsure about the data quality.
Bytes type
bytes
Variable-length binary data. This data type is rarely used but can be useful for characters with unusual encoding.
Time types
Only the date
, datetime
and timestamp
data types (not time
) allow table partitioning.
Time zone management being difficult with BigQuery, prefer the UTC format.
date
This data type represents a calendar date. It includes the year, month, and day.
time
This data type represents a time, as might be displayed on a watch, independent of a specific date. It includes the hour, minute, second, and subsecond.
datetime
This data type represents a date and time, as they might be displayed on a calendar or clock. It includes the year, month, day, hour, minute, second, and subsecond.
timestamp
This data type represents an absolute point in time, with microsecond precision.
β Table copy tasks
If necessary, you can duplicate an existing table using a table copy task, for example to share the contents of a table with a partner inside their own dataset. Although it would be possible to use an SQL script with SELECT *, a copy task is more efficient. The parameters set in the JSON configuration file are sufficient, no specific file is required for this type of task.
Last updated