Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Overview
  • Create your files
  • Create the JSON file that configures the data operation
  • Create a SQL file
  • Create the JSON file that triggers the workflow
  • Deploy the data operation
  • Check the data operation status in Tailer Studio
  • 🚀 Further steps

Was this helpful?

Edit on GitHub
  1. [Tutorial] Create a first data pipeline

Prepare data

The third data operation of this tutorial will consist in preparing data within BigQuery tables.

PreviousLoad files into BigQuery tablesNextBuild predictions

Last updated 1 year ago

Was this helpful?

Overview

The objective of this step will be to create new BigQuery tables into which we will load and reorganize the contents of the tables created at the previous step. As in most cases, this will happen within one BigQuery dataset. For this, we will need:

  • a JSON file to configure the data operation,

  • a JSON file to trigger the workflow,

  • a JSON file for each table creation,

  • and a SQL file for each transfer of data into our new tables.

Create your files

Create the JSON file that configures the data operation

  1. Access your tailer-demo folder.

  2. Inside, create a folder named 3-Prepare-data for this new step.

  3. In this folder, create a JSON file named 000099-tailer-demo-prepare-data.json for your data operation.

  4. The data operation will load a temporary table, and then, if the query runs correctly, then we copy the temporary table into the target table. Copy the following contents into your file:

    {
      "$schema": "http://jsonschema.tailer.ai/schema/table-to-table-veditor",
      "configuration_type": "table-to-table",
      "configuration_id": "000099-load_my_gbq_dataset_my_table",
      "short_description": "Prepare data for the Tailer demo",
      "account": "000099",
      "environment": "DEV",
      "activated": true,
      "archived": false,
      "start_date": "2023, 1, 23",
      "catchup": false,
      "schedule_interval": "None",
      "default_gcp_project_id": "my-gcp-project",
      "default_bq_dataset": "my_gbq_dataset",
      "default_write_disposition": "WRITE_TRUNCATE",
      "task_dependencies": [
        "load_temp_sales >> swap_sales_tables"
      ],
      "workflow": [
        {
          "task_type": "sql",
          "id": "load_temp_sales",
          "short_description": "Load temp sales table",
          "bq_dataset": "temp",
          "table_name": "sales",
          "sql_file": "load_sales.sql"
        },
        {
          "id": "swap_sales_tables",
          "task_type": "copy_gbq_table",
          "source_gcp_project_id": "my-gcp-project",
          "source_bq_dataset": "temp",
          "source_bq_table": "sales",
          "bq_dataset": "my-gcp-project",
          "destination_bq_table": "sales"
        }
      ]
    }
  5. Edit the following values: ◾ Replace my-gcp-project with the ID of the GCP project containing your BigQuery dataset in the default_gcp_project_id and in the source_gcp_project_id parameters. ◾ Replace my_gbq_dataset with the name of your working dataset. ◾ Also replace the project and dataset in the configuration_id

Create a SQL file

Create a SQL file in the same directory and name it load_sales.sql. It must contain a query that will load the sales table.

Create the JSON file that triggers the workflow

Inside the 3-Prepare-data folder, create a file named workflow.json.

Copy the following contents into your file:

{
    "$schema": "http://jsonschema.tailer.ai/schema/workflow-veditor",
    "configuration_type": "workflow",
    "version":"2",
    "configuration_id": "000099-trigger_load_my_gbq_dataset_my_table_DEV",
    "short_description": "Launch the Tailer demo data load",
    "account": "000099",
    "environment": "PROD",
    "activated": true,
    "archived": false,
    "gcp_project_id": "my-project",
    "schedule_interval_reset": "None",
    "authorized_job_ids": [
      "storage-to-tables|000099|000099-tailer-demo-load-files-YOUR-NAME|DEV|sales_{{FD_BLOB_8}}-{{FD_DATE}}.csv"
    ],
    "target_dag": {
        "configuration_type":"table-to-table",
        "configuration_id":"000099-load_my_gbq_dataset_my_table_DEV"
    }
}

This worklfow will trigger our Table-to-Table that loads the sales table each time a sales file is ingested with our Storage-to-Table operation.

The authorized_job_ids defines the job that triggers the target job. We need to insert here the job_id of the Storage-to-Table operation. You can find it in on Tailer Studio. Navigate to the Storage-to-Table runs section, find the last run for a sales file and go to the Run Details tab. Search for a job_id and copy it into the authorized_job_ids section.

Replace the configuration_id in the target_dag section by your configuration's configuration_id, concatenated with "_DEV" (for its environment should be DEV).

Once your files are ready, you can deploy the data operation:

  1. Access your working folder by running the following command:

    cd "[path to your tailer folder]\tailer-demo\3-Prepare-data"
  2. To deploy the data operation, run the following command:\

    tailer deploy configuration 000099-tailer-demo-prepare-data.json
  3. To trigger the workflow, run the following command:

    tailer deploy configuration workflow.json
tailer deploy configuration 000099-tailer-demo-prepare-data.json --context NO_CONTEXT
tailer deploy configuration workflow.json --context NO_CONTEXT

Your data operation is now deployed, which means the new tables will shortly be created and loaded with data, and your data operation status is now visible in Tailer Studio.

  1. In the left navigation menu, select Tables-to-tables.

  2. In the Configurations tab, search for your data operation. You can see its status is Activated. You can see in the top right corner that it as been deployed by you a few seconds ago.

  3. Check for your workflow configuration. It should also be activated and deployed by you a few seconds ago.

  4. Try to copy a sales file in your Storage-to-Tables source bucket. It should automatically trigger a Storage-to-Tables run. If it's successful, then it should trigger the workflow and create a Table-to-Table run.

  5. When it's completed and successful, you can check on BigQuery and see that your target table is loaded.

🚀 Further steps

We've seen here a very basic example of a Table-to-Table data operation that loads a temporary table, and then copy it to the target destination when no error occurs.

We could go further and add different steps, and use different Tailer features, for instance:

  • create the temporary table using a task of type create_gbq_table. This way, you can specify a DDL for this table, add column descriptions, column types, and define partitioning and clustering fields.

  • add a task that performs tests using expectations or custom asserts

  • add several SQL tasks

Deploy the data operation

You may be asked to select a context (see for more information). If you haven't deployed any context, then choose "no context". You can also use the flag --context to specify the context of your choice, or NO_CONTEXT if that's what you want:

Check the data operation status in Tailer Studio

Access again.‌

🗺️
📄
▶️
✅
this page
Tailer Studio