Load data with Storage to Tables

Learn how to transfer data from files to database tables using the Storage to Tables operation.

💡 What is Storage to Tables?

A Storage to Tables (STT) data pipeline operation allows you to load data files from a Google Cloud Storage (GCS) bucket into one or several BigQuery databases.

Note that the uniqueness of the configuration is checked against the GCS bucket name AND directory combination. This means that you can have only one configuration per bucket/directory combination, as any new configuration will overwrite the previous one.

✅ Supported file types

Source data files

  • CSV and any delimited flat files

  • New line delimited JSON files

  • These two file types can be compressed using gzip

Databases

  • Google BigQuery

⚙️ How it works

Every time a new file matching the specified rule appears in a given directory of a Google Cloud Storage bucket:

  • it will be removed from the source directory,

  • if options have been set accordingly, the file will be copied to an archive directory located in the same storage, inside a folder named with the date contained in the filename,

  • the file data will be loaded into a BigQuery table matching its filename template for each database specified.

🤖 Automated metadata

Automatic metadata feature will add specific columns during the ingestion process related to the inpput source.

The added columns are:

tlr_ingestion_timestamp_utc (TIMESTAMP)
tlr_input_file_source_type (STRING)
tlr_input_file_name (STRING)
tlr_input_file_full_resource_name (STRING)

📋 How to deploy a Storage to Tables data operation

  1. Access your tailer folder (created during installation).

  2. Create a working folder as you want, and create a JSON file for your data operation inside.

  3. Prepare your JSON configuration file. Refer to this page to learn about all the parameters.

  4. Prepare a DDL file for each database table. Refer to this page to learn about all the parameters.

  5. Access your working folder by running the following command:

    cd "[path to your working folder]"
  6. To deploy the data operation, run the following command:

    tailer deploy your-file.json
  7. Log in to Tailer Studio to check the status and details of your data operation.

  8. Access your output table(s), and archive folder, if any, to check the result of the data operation.

Last updated