Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Create your configuration files
  • Create the JSON file for the data pipeline operation
  • Create the table schema files‌
  • Deploy the data operation
  • Check the data operation in Tailer Studio
  • 🗳️ Check the result in GCP
  • 🚀 Further steps

Was this helpful?

Edit on GitHub
  1. [Tutorial] Create a first data pipeline

Load files into BigQuery tables

The second data operation of this tutorial will consist in loading the contents of our files into BigQuery tables.

PreviousCopy files from one bucket to anotherNextPrepare data

Last updated 1 year ago

Was this helpful?

Create your configuration files

Create the JSON file for the data pipeline operation

  1. Access your tailer-demo folder.

  2. Inside, create a folder named 2-Load_files for this new step.

  3. In this folder, create a JSON file named 000099-tailer-demo-load-files.json for your data operation.

  4. Copy the following contents into your file:

    {
      "$schema": "http://jsonschema.tailer.ai/schema/storage-to-tables-veditor",
      "configuration_type": "storage-to-tables",
      "configuration_id": "000099-tailer-demo-load-files-YOUR-NAME",
      "version": "2",
      "environment": "DEV",
      "account": "000099",
      "activated": true,
      "archived": false,
      "short_description": "This data operation loads files into BigQuery tables.",
      "doc_md": "tailer-demo-stt.md",
      "source": {
        "type": "gcs",
        "gcp_project_id": "my-gcp-project",
        "gcs_source_bucket": "my-bucket",
        "gcs_source_prefix": "input-folder-YOUR-NAME",
        "gcs_archive_prefix": "archive-folder-YOUR-NAME",
        "gcp_credentials_secret": {
          "cipher_aes": "xxx",
          "ciphertext": "xxx",
          "enc_session_key": "xxx",
          "tag": "xxx"
        }
      },
      "destinations": [
        {
          "type": "bigquery",
          "gcp_project_id": "my-gcp-project",
          "gbq_dataset": "my_gbq_dataset_YOUR_NAME",
          "source_format": "CSV",
          "bq_load_job_ignore_unknown_values": true,
          "create_disposition": "CREATE_IF_NEEDED",
          "write_disposition": "WRITE_TRUNCATE",
          "skip_leading_rows": 1,
          "field_delimiter": "|",
          "gcp_credentials_secret": {
            "cipher_aes": "xxx",
            "ciphertext": "xxx",
            "enc_session_key": "xxx",
            "tag": "xxx"
          },
          "tables": [
            {
              "table_name": "stores",
              "short_description": "Store repository",
              "filename_template": "stores-{{FD_DATE}}-{{FD_TIME}}.csv",
              "ddl_file": "ddl/stores.json",
              "doc_md": "ddl/stores.md"
            },
            {
              "table_name": "products",
              "short_description": "Product repository",
              "filename_template": "products-{{FD_DATE}}-{{FD_TIME}}.csv",
              "ddl_file": "ddl/products.json"
            },
            {
              "table_name": "sales",
              "short_description": "Daily Iowa Liquor sales",
              "filename_template": "sales_{{FD_BLOB_8}}-{{FD_DATE}}.csv",
              "ddl_file": "ddl/sales.json"
            },
            {
              "table_name": "sales_daily",
              "short_description": "Daily Iowa Liquor sales",
              "filename_template": "sales_{{FD_DATE}}.csv",
              "ddl_file": "ddl/sales_daily.json"
            }
          ]
        }
      ]
    }
  5. Edit the following values:

    ◾ In the source section, replace my-gcp-project with the ID of the GCP project containing the source bucket.

    ◾ In the source section, replace my-bucket with the name of the GCS bucket containing the input files (output files from the previous step).

    ◾In the source section, replace the value of the gcp_credentials_secret parameter with the service account credentials for the GCP project containing the source bucket.

    ◾ In the destinations section, replace my-gcp-project with the ID of the GCP project containing the target dataset. It can be the same as in the source section or a different one.

    ◾ In the destinations section, replace my-gbq-dataset-YOUR-NAME with the name of the dataset that will contain the tables. You need to create this dataset in your destination project beforehand. The storage-to-tables data operation won't create an empty dataset if needed.

    ◾ In the destinations section, replace the value of the gcp_credentials_secret parameter with the service account credentials for the GCP project containing the target dataset. ◾ If you share the demo project with other developers, then in the configuration_id, replace YOUR-NAME by a personal value, like your name. This way, you won't overwrite a configuration deployed by someone else. You should also add your name in the source's gcs_source_prefix and archive_prefix, and in the destinations' gbq_dataset to avoid any interferences with another developer's data operation.

  6. Create a Markdown file named tailer-demo-stt.md. You can use it freely to describe the data operation.

Create the table schema files‌

  1. Inside the 2-Load_files folder, create a folder named ddl. It will contain the table schema files.

  2. Inside the ddl folder, create four files: ◾ stores.json ◾ products.json ◾ sales.json ◾ sales_daily.json

  3. Copy the following contents into the stores.json file:

    {
      "schema": [
        {
          "name": "store_number",
          "type": "STRING",
          "description": "Unique number of the store that ordered the liquor."
        },
        {
          "name": "store_name",
          "type": "STRING",
          "description": "Name of the store that ordered the liquor."
        },
        {
          "name": "address",
          "type": "STRING",
          "description": "Address of the store that ordered the liquor."
        },
        {
          "name": "city",
          "type": "STRING",
          "description": "City of the store that ordered the liquor."
        },
        {
          "name": "zip_code",
          "type": "STRING",
          "description": "Zip code of the store that ordered the liquor."
        },
        {
          "name": "store_location",
          "type": "STRING",
          "description": "Location of the store that ordered the liquor."
        },
        {
          "name": "county_number",
          "type": "STRING",
          "description": "Iowa county number of the store that ordered the liquor."
        },
        {
          "name": "county",
          "type": "STRING",
          "description": "County of the store that ordered the liquor."
        }
      ]
    }
  4. Copy the following contents into the products.json file.

    {
      "schema": [
        {
          "name": "category",
          "type": "STRING",
          "description": "Category code of the liquor."
        },
        {
          "name": "category_name",
          "type": "STRING",
          "description": "Category of the liquor."
        },
        {
          "name": "vendor_number",
          "type": "STRING",
          "description": "The vendor number of the company for the liquor brand."
        },
        {
          "name": "item_number",
          "type": "STRING",
          "description": "Item number for each individual liquor product."
        },
        {
          "name": "item_description",
          "type": "STRING",
          "description": "Description of each individual liquor product."
        },
        {
          "name": "pack",
          "type": "STRING",
          "description": "The number of bottles in one box for the liquor."
        },
        {
          "name": "bottle_volume_ml",
          "type": "STRING",
          "description": "Volume of each liquor bottle in milliliters."
        },
        {
          "name": "state_bottle_cost",
          "type": "STRING",
          "description": "The amount the State paid for each bottle of liquor."
        },
        {
          "name": "state_bottle_retail",
          "type": "STRING",
          "description": "The amount the store paid for each bottle of liquor."
        }
      ]
    }
  5. Copy the following contents into the sales.json file.

    {
      "schema": [
        {
          "name": "invoice_and_item_number",
          "type": "STRING",
          "description": "Concatenated invoice and line number of the liquor order."
        },
        {
          "name": "date",
          "type": "STRING",
          "description": "Date of order."
        },
        {
          "name": "store_number",
          "type": "STRING",
          "description": "Unique number of the store that ordered the liquor."
        },
        {
          "name": "item_number",
          "type": "STRING",
          "description": "Item number for each individual liquor product ordered."
        },
        {
          "name": "bottles_sold",
          "type": "STRING",
          "description": "The number of bottles of liquor ordered by the store."
        },
        {
          "name": "bottle_volume_ml",
          "type": "STRING",
          "description": "Volume of each liquor bottle ordered in milliliters."
        },
        {
          "name": "sale_dollars",
          "type": "STRING",
          "description": "Total cost of liquor order."
        },
        {
          "name": "volume_sold_liters",
          "type": "STRING",
          "description": "Total volume of liquor ordered in liters."
        },
        {
          "name": "volume_sold_gallons",
          "type": "STRING",
          "description": "Total volume of liquor ordered in gallons."
        }
      ]
    }
  6. Copy the following contents into the sales_daily.json file.

    {
      "schema": [
        {
          "name": "invoice_and_item_number",
          "type": "STRING",
          "description": "Concatenated invoice and line number of the liquor order."
        },
        {
          "name": "date",
          "type": "STRING",
          "description": "Date of order"
        },
        {
          "name": "store_number",
          "type": "STRING",
          "description": "Unique number of the store that ordered the liquor."
        },
        {
          "name": "item_number",
          "type": "STRING",
          "description": "Item number for each individual liquor product ordered."
        },
        {
          "name": "bottles_sold",
          "type": "STRING",
          "description": "The number of bottles of liquor ordered by the store."
        },
        {
          "name": "bottle_volume_ml",
          "type": "STRING",
          "description": "Volume of each liquor bottle ordered in milliliters."
        },
        {
          "name": "sale_dollars",
          "type": "STRING",
          "description": "Total cost of liquor order."
        },
        {
          "name": "volume_sold_liters",
          "type": "STRING",
          "description": "Total volume of liquor ordered in liters."
        },
        {
          "name": "volume_sold_gallons",
          "type": "STRING",
          "description": "Total volume of liquor ordered in gallons."
        }
      ]
    }
  7. Create Markdown files for each DDL file. You can use them freely to describe the table schemas.

By default, in the DDL, all the database fields created have the "string" type. This will be modified during the next data pipeline operation if necessary.

Once your files are ready, you can deploy the data operation:

  1. Access your working folder by running the following command:

    cd "[path to your tailer folder]\jarvis-demo\2-Load_files"
  2. To deploy the data operation, run the following command:

    tailer deploy configuration 000099-tailer-demo-load-files.json
tailer deploy configuration 000099-tailer-demo-load-files.json --context NO_CONTEXT

Your data operation is now deployed, which means the files will shortly be loaded into tables, and your data operation status is now visible in Tailer Studio.

  1. In the left navigation menu, select Storage-to-tables.

  2. In the Configurations tab, search for your data operation, 000099-tailer-demo-load-files. You can see its status is Activated.

  3. Click the data operation ID to display its parameters and full JSON file, or to leave comments about it. in the Tables section, you can access the table schema, parameters, and documentation provided in the Markdown files.

🗳️ Check the result in GCP

  • In your source bucket, copy a file. The file name must match one of the filename_template specified in the configuration.

  • On Tailer Studio, in the Storage-to-Tables section, Runs tab, you should see a run for your data operation. It should appear as "running" and quickly get the status "success".

  • In your source bucket, input-folder should be empty.

  • In your source bucket, archive-tailer-demo-folder should contain a folder for each input file, named as the filename date.

  • Your destination dataset should contain a table corresponding to the input files.

🚀 Further steps

  • Load different input format as JSON, PARQUET or AVRO files, or gzip compressed files

  • Allow unknown supplementary fields using the "bq_load_job_ignore_unknown_values" parameter to allow partners to provide new columns without any risk of service interruption

  • Try the other ddl_mode, like "header" to infer a table format based on the header row, or "file_template" which allow to provide a ddl_file directly in the source bucket

Deploy the data operation

You may be asked to select a context (see for more information). If you haven't deployed any context, then choose "no context". You can also use the flag --context to specify the context of your choice, or NO_CONTEXT if that's what you want:

Check the data operation in Tailer Studio

Access again.‌

Now that our configuration is deployed, we can test it. Let's mimic production behavior. Access the folders you created when :

You can check the full and try other features:

📄
▶️
✅
this page
Tailer Studio
preparing the demonstration environment
Storage to Tables documentation