[V1-V2: deprecated] Table to Storage configuration file
This is the description of the JSON configuration file of a Table to Storage data operation.
The configuration file is in JSON format. It contains the following sections:
Global parameters: General information about the data operation.
Table copy parameters: Optionally, you can add a creation step for a table that will contain the result of the extraction.
👁️🗨️ Example
Here is an example of TTS configuration file:
🌐 Global parameters
General information about the data operation.
Parameter | Description |
---|---|
configuration_type type: string mandatory | Type of data operation. For a TTS data operation, the value is always "table-to-storage". |
configuration_id type: string mandatory | ID of the data operation. You can pick any name you want, but is has to be unique for this data operation type. Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to name your data operation by concatenating:
|
short_description type: string optional | Short description of the table to storage data operation. |
environment type: string mandatory | Deployment context. Values: PROD, PREPROD, STAGING, DEV. |
account type: string mandatory | Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator. |
activated type: boolean optional | Flag used to enable/disable the execution of the data operation. Default value: true |
archived type: boolean optional | Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer¯Studio. Default value: false |
gcs_dest_bucket type: string mandatory | Google Cloud Storage destination bucket. This is the bucket where the data is going to be extracted. |
gcs_dest_prefix type: string mandatory | Path in the GCS bucket where the files will be extracted, e.g. "some/sub/dir". |
delete_dest_bucket_content type: boolean optional | If set to true, this parameter will trigger the preliminary deletion of any items present in the destination directory. This can prevent an issue when a new run of the same operation is needed after a fix. If the first run had generated file-0.csv and file-1.csv, and then the 2nd run only returns and erases file-0.csv, then you need to delete the destination bucket at the begining of the 2nd run, or you will end up with a file-0.csv from the 2nd run and a file-1.csv from the first run. Default value: false |
gcp_project type: string mandatory | ID of the Google Cloud project containing the BigQuery instance. |
gcp_project_id type: string optional | Enter the same value as gcp_project to avoid the question of project selection during a deployment with tailer deploy configuration command. |
field_delimiter type: string optional | Separator for fields in the CSV output file, e.g. ";". Note: For Tab separator, set to "\t". Default value: " |
print_header type: boolean optional | Print a header row in the exported data. Default value: true |
sql_file type: string mandatory | Path to the file containing the extraction query. |
compression type: string optional | Compression mode for the output file. Possible values: "None", "GZIP" Note that if you specify "GZIP", a ".gz" extension will be added at the end of the filename. Default value: "None" |
output_filename type: string mandatory | Template for the output filename. You can use the following placeholders inside the name:
|
destination_format type: string optional | Define the format of the output file : Possible values: "NEWLINE_DELIMITED_JSON" (JSON file), "AVRO" Note that if you specify "NEWLINE_DELIMITED_JSON", the field-delimiter parameter is not taken into account. Default value: "CSV" |
👬 Table copy parameters
If you want to create a copy of your output data in a BigQuery table, you need to set the following parameters.
Parameter | Description |
---|---|
copy_table type: boolean optional | Parameter used to enable a copy of the output data in a BigQuery table. Default value: false |
dest_gcp_project_id mandatory if copy_table is set to "true" | ID of the GCP project that will contain the table copy. |
dest_gbq_dataset mandatory if copy_table is set to "true" | Name of the BigQuery dataset that will contain the table copy. |
dest_gbq_table mandatory if copy_table is set to "true" | Name of the BigQuery table copy. |
dest_gbq_table_suffix optional, to use only if copy_table is set to "true" | The only supported value for this parameter is "dag_execution_date". This will add "_yyyymmdd" at the end of the table name to enable ingestion time partitioning. |
Last updated