Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Overview
  • SQL tasks
  • Table creation tasks
  • DDL Example
  • DDL Parameters
  • DDL Data Types
  • Table copy tasks

Was this helpful?

Edit on GitHub
  1. Data Pipeline Operations
  2. Transform data with Tables to Tables

Table to Table SQL and DDL files

Learn how to create the SQL and DDL files corresponding to the workflow tasks of a Table to Table data operation.

PreviousTables to Tables configuration fileNextExport data with Tables to Storage

Last updated 8 months ago

Was this helpful?

Overview

A SQL workflow is a sequence of tasks that feed tables in parallel or sequentially.

A workflow can be composed of the following task types:

  • SQL task ("sql"): instructions to load, merge and reorganize data.

  • Table creation task ("create_gbq_table"): instructions to create a destination table.

  • Table copy task ("copy_gbq_table"): instructions to duplicate a table (provided in the ).

SQL tasks

SQL tasks are steps of the workflow. Each SQL task is defined with a .sql file that contains the query. You can write the queries directly in the query editor of and then save them to .sql files.

The file can contain a SQL query or a SQL script (like assertions or expectations).

The name of the SQL file should be the same as the SQL task.

Example:

SELECT
    customer_id,
    optin,
    cast(creation_date as date) as creation_date
FROM `referential.customers`
QUALIFY ROW_NUMBER() OVER(dedup_pk) = 1
WINDOW dedup_pk as (
  PARTITION BY customer_id
  ORDER BY update_date desc, import_date desc, tlr_ingestion_timestamp_utc desc
)

Once the SQL queries are ready, you need to use one or several DDL files to create the destination BigQuery tables that will contain the output data.

DDL Example

{
    "bq_table_description": "Describe the content of the table. The description will be attached to the table bigquery Metadata",
    "bq_table_schema": [
        {
            "name": "field_date",
            "type": "DATE",
            "description": "Describe the field_date"
        },
        {
            "name": "field_2",
            "type": "STRING",
            "description": "Describe the field_2"
        },
        {
            "name": "field_3",
            "type": "INTEGER",
            "description": "Describe the field_3",
            "mode": "REQUIRED"
        }
    ],
    "bq_table_clustering_fields": ["field_2", "field_3"]
    "bq_table_timepartitioning_field": "field_date",
    "bq_table_timepartitioning_type": "MONTH",
    "bq_table_timepartitioning_expiration_ms": "86400000",
    "bq_table_timepartitioning_require_partition_filter": false
}

DDL Parameters

Parameter
Description

bq_table_description

type: string

mandatory

Description of the BigQuery table.

bq_table_schema

type: array

mandatory

BigQuery table schema. It contains a list of fields corresponding to the number of columns it will contain.

Each field described has three attributes:

  • name

  • description

bq_table_clustering_fields

type: array

optional

List of fields used when clustering is enabled.

The table data will be automatically organized based on the contents of the fields you specify. Their order determines the sort order of the data.

If this parameter is set, time partitioning will be automatically enabled on the table. If you don't set partitioning parameters, default values will be used.

bq_table_timepartitioning_field

type: string

optional

If this parameter is set, the table will be partitioned by this field.

If not, the table will be partitioned by pseudo column _PARTITIONTIME.

The field must be a top-level TIMESTAMP or DATE field. Its mode must be NULLABLE or REQUIRED.

Note: You can set this parameter to a field that equals to DATE(''). Then, if you relaunch an execution with a partition, and if default_write_disposition is set to "WRITE_APPEND" in the JSON configuration file, Tailer will check if the corresponding partition already exists in the table:

  • If it does, it will delete it, and replace it with the current execution data.

  • If not, it will add them.

bq_table_timepartitioning_type type: string

optional

Sets the partition type. Use one of the following values: "HOUR", "DAY", "MONTH" or "YEAR".

If not present, default is "DAY".

bq_table_timepartitioning_expiration_ms

type: integer

optional

Number of milliseconds for which to keep the storage for a partition.

bq_table_timepartitioning_require_partition_filter

type: boolean

optional

If set to true, queries over the partitioned table require a partition filter that can be used for partition elimination to be specified.

DDL Data Types

Tailer Platform supports the following data types.

Numeric types

Name
Description

int64

Integers are numeric values that do not have fractional components.

They range from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

float64

Floating point values are approximate numeric values with fractional components.

numeric

This data type represents decimal values with 38 decimal digits of precision and 9 decimal digits of scale. (Precision is the number of digits that the number contains. Scale is how many of these digits appear after the decimal point.)

It is particularly useful for financial calculations.

Boolean type

Name
Description

boolean

This data type supports the true, false, and null values. It can perform some basic conversions, such as 'true', 'True', True, or 1 becoming true.

String type

Name
Description

string

Variable-length character data.

When converting data from string to a different data type, makes sure to use safe_cast when you're unsure about the data quality.

Bytes type

Name
Description

bytes

Variable-length binary data. This data type is rarely used but can be useful for characters with unusual encoding.

Time types

Only the date, datetime and timestamp data types (not time) allow table partitioning.

Time zone management being difficult with BigQuery, prefer the UTC format.

Name
Description

date

This data type represents a calendar date. It includes the year, month, and day.

time

This data type represents a time, as might be displayed on a watch, independent of a specific date. It includes the hour, minute, second, and subsecond.

datetime

This data type represents a date and time, as they might be displayed on a calendar or clock. It includes the year, month, day, hour, minute, second, and subsecond.

timestamp

This data type represents an absolute point in time, with microsecond precision.

If necessary, you can duplicate an existing table using a table copy task, for example to share the contents of a table with a partner inside their own dataset. Although it would be possible to use an SQL script with SELECT *, a copy task is more efficient. The parameters set in the JSON configuration file are sufficient, no specific file is required for this type of task.

Table creation tasks

(Refer to for more information.)

(Refer to for more information.)

(Refer to for more information.)

Table copy tasks

🆕
♒
🗺️
🛢️
data operation configuration file
BigQuery
BigQuery documentation
BigQuery documentation
BigQuery documentation
type