Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Example
  • Global parameters
  • Table copy parameters

Was this helpful?

Edit on GitHub
  1. Data Pipeline Operations
  2. Export data with Tables to Storage

[V1-V2: deprecated] Table to Storage configuration file

This is the description of the JSON configuration file of a Table to Storage data operation.

PreviousTable to Storage SQL fileNextOrchestrate processings with Workflow

Last updated 1 year ago

Was this helpful?

The configuration file is in JSON format. It contains the following sections:

  • Global parameters: General information about the data operation.

  • Table copy parameters: Optionally, you can add a creation step for a table that will contain the result of the extraction.

Example

Here is an example of TTS configuration file:

{
  "configuration_type": "table-to-storage",
  "configuration_id": "tts-some-id-example",
  "short_description" : "Short description of the job",
  "environment": "DEV",
  "account": "000111",
  "activated": true,
  "archived": false,
  "gcs_dest_bucket": "152-composer-test",
  "gcs_dest_prefix": "jultest_table_to_storage/",
  "gcp_project": "fd-tailer-datalake",
  "field_delimiter": "|",
  "print_header": true,
  "sql_file": "jul_test.sql",
  "compression": "None",
  "output_filename": "{{FD_DATE}}_some_file_name.csv",
  "copy_table": false,
  "dest_gcp_project_id": "GCP Project ID used if copy_table is true",
  "dest_gbq_dataset": "GBQ Dataset used if copy_table is true",
  "dest_gbq_table": "GBQ Table name used if copy_table is true",
  "dest_gbq_table_suffix": "dag_execution_date",
  "delete_dest_bucket_content": true
}

General information about the data operation.

Parameter
Description

configuration_type

type: string

mandatory

Type of data operation.

For a TTS data operation, the value is always "table-to-storage".

configuration_id

type: string

mandatory

ID of the data operation.

You can pick any name you want, but is has to be unique for this data operation type.

Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to name your data operation by concatenating:

  • your account ID,

  • the word "extract",

  • and a description of the data to extract.

short_description

type: string

optional

Short description of the table to storage data operation.

environment

type: string

mandatory

Deployment context.

Values: PROD, PREPROD, STAGING, DEV.

account

type: string

mandatory

Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.

activated

type: boolean

optional

Flag used to enable/disable the execution of the data operation.

Default value: true

archived

type: boolean

optional

Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer¯Studio.

Default value: false

gcs_dest_bucket

type: string

mandatory

Google Cloud Storage destination bucket.

This is the bucket where the data is going to be extracted.

gcs_dest_prefix

type: string

mandatory

Path in the GCS bucket where the files will be extracted, e.g. "some/sub/dir".

delete_dest_bucket_content

type: boolean

optional

If set to true, this parameter will trigger the preliminary deletion of any items present in the destination directory.

This can prevent an issue when a new run of the same operation is needed after a fix. If the first run had generated file-0.csv and file-1.csv, and then the 2nd run only returns and erases file-0.csv, then you need to delete the destination bucket at the begining of the 2nd run, or you will end up with a file-0.csv from the 2nd run and a file-1.csv from the first run.

Default value: false

gcp_project

type: string

mandatory

ID of the Google Cloud project containing the BigQuery instance.

gcp_project_id

type: string

optional

Enter the same value as gcp_project to avoid the question of project selection during a deployment with tailer deploy configuration command.

field_delimiter

type: string

optional

Separator for fields in the CSV output file, e.g. ";".

Note: For Tab separator, set to "\t".

Default value: "

print_header

type: boolean

optional

Print a header row in the exported data.

Default value: true

sql_file

type: string

mandatory

Path to the file containing the extraction query.

compression

type: string

optional

Compression mode for the output file.

Possible values: "None", "GZIP"

Note that if you specify "GZIP", a ".gz" extension will be added at the end of the filename. Default value: "None"

output_filename

type: string

mandatory

Template for the output filename.

You can use the following placeholders inside the name:

  • {{FD_DATE}}: The date format will be YYYYMMDD

  • {{FD_TIME}}: The time format will be hhmmss

destination_format

type: string

optional

Define the format of the output file :

Possible values: "NEWLINE_DELIMITED_JSON" (JSON file), "AVRO"

Note that if you specify "NEWLINE_DELIMITED_JSON", the field-delimiter parameter is not taken into account. Default value: "CSV"

If you want to create a copy of your output data in a BigQuery table, you need to set the following parameters.

Parameter
Description

copy_table

type: boolean

optional

Parameter used to enable a copy of the output data in a BigQuery table.

Default value: false

dest_gcp_project_id

mandatory if copy_table is set to "true"

ID of the GCP project that will contain the table copy.

dest_gbq_dataset

mandatory if copy_table is set to "true"

Name of the BigQuery dataset that will contain the table copy.

dest_gbq_table

mandatory if copy_table is set to "true"

Name of the BigQuery table copy.

dest_gbq_table_suffix

optional, to use only if copy_table is set to "true"

The only supported value for this parameter is "dag_execution_date".

This will add "_yyyymmdd" at the end of the table name to enable ingestion time partitioning.

Global parameters

If several table-to-storage operations write in the same directory at the same time, and if this parameter is true, then some extracted files can be deleted by mistake. The best practice is to have a dedicated subdirectory for each operation.

BigQuery splits the content in several numbered files if you export more than 1 GB of data. A number starting at 0 and left-padded to 12 digits is added before the extension and after a "-". To ensure a consistent behavior, this number is always added, even if you export less than 1 GB. For example, an operation with the output_filename "{{FD_DATE}}-{{FD_TIME}}_my_data_extraction.csv" executed the 2022-01-01 on 06:32:16 will generate a file: 20220101-063216_my_data_extraction-000000000000.csv

Table copy parameters

👁️‍🗨️
🌐
👬
⚠️
⚠️