Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Overview
  • Create a bucket and a folder
  • Create your configuration files
  • Create the JSON file that configures the data pipeline operation
  • Create a SQL file
  • Create the JSON file that will trigger the workflow
  • Deploy the data operation
  • Run your workflow manually
  • Check the result in GCS
  • 🚀 Further steps

Was this helpful?

Edit on GitHub
  1. [Tutorial] Create a first data pipeline

Export data

The fifth and final data operation of this tutorial will consist in exporting our data back to a bucket.

PreviousBuild predictionsNextCongratulations!

Last updated 1 year ago

Was this helpful?

Overview

During this step, we will take our aggregated store data located one BigQuery table and export them to a Google Cloud Storage bucket CSV file so they can later be used with other tools, such as a warehouse management system.

Create a bucket and a folder

For the detailed procedure on how to create GCS buckets (manually or using gsutil), refer to this .

  1. Create a bucket in the project of your choice. As bucket names need to be unique globally, you can pick any name you want. Select the settings that you want.‌

  2. In this bucket, create a folder named store_clustering_export that will contain our future export file.

Create your configuration files

Create the JSON file that configures the data pipeline operation

  1. Access your tailer-demo folder.

  2. Inside, create a folder named 5-Export-data for this new step.

  3. In this folder, create a JSON file named 000099-tailer-demo-export-data.json for your data operation.

  4. Copy the following contents into your file:

    {
        "$schema": "http://jsonschema.tailer.ai/schema/table-to-storage-veditor",
        "configuration_type": "table-to-storage",
        "configuration_id": "000099-tailer-demo-export_YOUR_NAME",
        "short_description": "Short description of the job",
        "environment": "DEV",
        "account": "000099",    
        "version": "3",
        "activated": true,
        "archived": false,
        "start_date" : "2023, 2, 10",
        "schedule_interval" : "None",
        
        "dest_gcp_project_id": "my-gcp-project",
        "gcs_dest_bucket": "my-gcs-bucket",
        "gcs_dest_prefix": "output_YOUR_NAME",
      
        "print_header": true,
        "destination_format": "CSV",
        "field_delimiter": ",",
        
        "copy_table": true,
        "dest_gcp_project_id": "my-gcp-project",
        "dest_gbq_dataset": "my_dataset",
        "dest_gbq_table_suffix": "dag_execution_date",
        
        "tasks": [
            {
                "task_id": "demo_export",
                "sql_file" : "my_SQL_file.sql",
                "output_filename" : "demo_export_YOUR_NAME_{{FD_DATE}}.csv",
                "dest_gbq_table": "demo_export_YOUR_NAME"
            }
        ]
    }
  5. Edit the following values: ◾ Replace my-gcp-project with the ID of the GCP project containing the source table. This is where the SQL query will be run. ◾ Replace my-gbq-dataset with the name of the dataset where you want to create a copy of the table generated with the SQL request. ◾ Replace my-gcs-bucket with the name of the bucket that you've just created, where the export file will be generated. ◾ If you share the project with others, then don't forget to personalize your outputs so you won't erase your team mate's work. You can search for "_YOUR_NAME" and replace all the occurrences.

  6. Save your file.

Create a SQL file

  1. Inside the 5-Export-data folder, create a file named export_data.sql.

  2. Copy the following contents into the export_data.sql file:

    SELECT * FROM `my-gbq-dataset.store_clustering`
  3. Replace my-gbq-dataset with the name of your working dataset in the previous step.

  4. Save your file.

Create the JSON file that will trigger the workflow

  1. Inside the 5-Export-data folder, create a file named workflow.json.

  2. Copy the following contents into your file:

    {
        "$schema": "http://jsonschema.tailer.ai/schema/workflow-veditor",
        "configuration_type": "workflow",
        "version":"2",
        "configuration_id": "000099-tailer-demo-export-data_YOUR_NAME",
        "short_description": "Launch the Tailer demo data load",
        "account": "000099",
        "environment": "DEV",
        "activated": true,
        "archived": false,
        "gcp_project_id": "my-project",
        "authorized_job_ids": [
          "gbq-to-gbq|000099-load_my_gbq_dataset_my_table_DEV"
        ],
        "target_dag": {
            "configuration_type":"table-to-storage",
            "configuration_id":"000099-tailer-demo-export_YOUR_NAME_DEV"
        }
    }
  3. Save your file.

Once your files are ready, you can deploy the data operation:

  1. Access your working folder by running the following command:

    cd "[path to your tailer folder]\tailer-demo\5-Export-data"
  2. To deploy the data operation, run the following command:

    tailer deploy configuration 000099-tailer-demo-export-data.json

Deploying the workflow at this stage would not launch it, as the workflow will only be triggered by the execution of the previous step (building predictions). We will run it manually for now so we can see the result. Once we finish setting up the pipeline, the workflow will run automatically starting from its first step (copying files) when we add files into the source bucket.

  1. In the left navigation menu, select Table-to-storage.

  2. In the Configurations tab, search for your data operation, 000099-tailer-demo-export-data.

  3. Click the data operation ID to display its details.

  4. In the upper right corner, click on Launch.

Access the GCS folder in the bucket you've just created. Your data should now appear in the form of a CSV file that you can export to a different system.

🚀 Further steps

  • Add some tasks to perform different extractions

  • Create a JSON extract or compress the output using GZIP

  • Insert the run date in your query using the "sql_query_template" parameters

Deploy the data operation

Run your workflow manually

Access .‌

Check the result in GCS

You can now add more files into the input folder from the to see the whole pipeline play out!

You can check the full and try other parameters:

Send the data file to a partner using a operation, or ingest it into Firestore using operation.

Insert environment variables in your SQL using a .

▶️
🖐️
🗳️
🗺️
🗂️
📄
page
Tailer Studio
first step of this tutorial
Tables to Storage documentation
Storage to Storage
a specific VM Launcher
Context configuration