Tailer Documentation
  • What is Tailer Platform?
  • Getting Started
    • Prepare your local environment for Tailer
    • Install Tailer SDK
    • Set up Google Cloud Platform
    • Encrypt your credentials
  • [Tutorial] Create a first data pipeline
    • Introduction
    • Prepare the demonstration environment
    • Copy files from one bucket to another
    • Load files into BigQuery tables
    • Prepare data
    • Build predictions
    • Export data
    • Congratulations!
    • [Video] Automatic Script
      • SQL script file
      • DDL script file
      • Tables to Tables script file
      • Launch configuration and furthermore
  • Data Pipeline Operations
    • Overview
    • Set constants with Context
      • Context configuration file
    • Move files with Storage to Storage
      • Storage to Storage configuration file
    • Load data with Storage to Tables
      • Storage to Tables configuration file
      • Storage to Tables DDL files
    • Stream incoming data with API To Storage
      • API To Storage configuration file
      • API To Storage usage examples
    • Transform data with Tables to Tables
      • Tables to Tables configuration file
      • Table to Table SQL and DDL files
    • Export data with Tables to Storage
      • [V3] Table to Storage configuration file
      • Table to Storage SQL file
      • [V1-V2: deprecated] Table to Storage configuration file
    • Orchestrate processings with Workflow
      • [V2] Workflow configuration file
      • [V1: deprecated] Workflow configuration file
    • Convert XML to CSV
      • Convert XML to CSV configuration file
    • Use advanced features with VM Launcher
      • Process code with VM Launcher
        • VM Launcher configuration file for code processing
      • Encrypt/Decrypt data with VM Launcher
        • VM Launcher configuration file for data encryption
        • VM Launcher configuration file for data decryption
    • Monitoring and Alerting
      • Monitoring and alerting parameters
    • Asserting Data quality with Expectations
      • List of Expectations
    • Modify files with File Utilities
      • Encrypt/Decrypt data with File Utilities
        • Configuration file for data encryption
        • Configuration file for data decryption
    • Transfer data with GBQ to Firestore
      • Table to Storage: configuration file
      • Table to Storage: SQL file
      • VM Launcher: configuration file
      • File-to-firestore python file
  • Tailer Studio
    • Overview
    • Check data operations' details
    • Monitor data operations' status
    • Execute data operations
    • Reset Workflow data operations
    • Archive data operations
    • Add notes to data operations and runs
    • View your data catalog
    • Time your data with freshness
  • Tailer API
    • Overview
    • Getting started
    • API features
  • Release Notes
    • Tailer SDK Stable Releases
    • Tailer Beta Releases
      • Beta features
      • Beta configuration
      • Tailer SDK API
    • Tailer Status
Powered by GitBook
On this page
  • Example
  • Global parameters
  • Working folder parameters
  • Conversion parameters

Was this helpful?

Edit on GitHub
  1. Data Pipeline Operations
  2. Convert XML to CSV

Convert XML to CSV configuration file

This is the description of the JSON configuration file for a Convert XML to CSV data operation.

PreviousConvert XML to CSVNextUse advanced features with VM Launcher

Last updated 2 years ago

Was this helpful?

The configuration file is in JSON format. It contains the following sections:

  • : General information about the data operation.

  • : Information related to the working folder for the files.

  • : Information about the input file to process and the output files generated.

Example

Here is an example of Convert XML to CSV configuration file:

{
    "$schema": "http://jsonschema.tailer.ai/schema/xml-conversion-veditor",
    "configuration_type": "xml-conversion",
    "configuration_id": "000099-test-xml-conversion",
    "environment": "DEV",
    "account": "000099",
    "activated": true,
    "archived": false,
    "doc_md": "readme.md",
    "gcp_project_id": "fd-io-jarvis-demo-dlk",
    "gcs_bucket": "fd-io-demo-ds",
    "gcs_working_directory": "test_xml_conversion",
    "credentials": {
        "gcp-credentials.json": {
            "content": {
                "cipher_aes": "dd34e56f4...",
                "tag": "3d968340...",
                "ciphertext": "046ffe41f9c00448ea0f816119...",
                "enc_session_key": "36a0f8ffe1b1f0..."
            }
        }
    },
    "filename_templates": [
        {
            "filename_template": "coupon_{{FD_DATE}}.xml",
            "file_description": "This is a description.",
            "xsd_schema_file": "coupon.xsd",
            "output_suffix_filters":[
                "advantage.tsv",
                "barcodeType.tsv"
            ]
        }
    ]
}

General information about the data operation.

Parameter
Description

$schema

type: string

optional

The url of the json-schema that contains the properties that your configuration must verify. Most Code Editor can use that to validate your configuration, display help boxes and enlighten issues.

configuration_type

type: string

mandatory

Type of data operation.

For a Convert XML to CSV data operation, the value is always "xml-conversion".

configuration_id

type: string

mandatory

ID of the data operation.

You can pick any name you want, but is has to be unique for this data operation type.

Note that in case of conflict, the newly deployed data operation will overwrite the previous one. To guarantee its uniqueness, the best practice is to name your data operation by concatenating:

  • your account ID,

  • "xml-conversion".

  • and a description of the data to convert.

environment

type: string

mandatory

Deployment context.

Values: PROD, PREPROD, STAGING, DEV.

account

type: string

mandatory

Your account ID is a 6-digit number assigned to you by your Tailer Platform administrator.

activated

type: boolean

optional

Flag used to enable/disable the execution of the data operation.

If not specified, the default value will be "true".

archived

type: boolean

optional

Flag used to enable/disable the visibility of the data operation's configuration and runs in Tailer Studio.

If not specified, the default value will be "false".

doc_md

type: string

optional

Path to a file containing a detailed description of the data operation. The file must be in Markdown format.

Information related to the input/output working directory in Google Cloud Storage.

Parameter
Description

gcp_project_id

type: string

mandatory

Google Cloud Platform project ID for the bucket where the data is going to be converted.

gcs_bucket

type: string

mandatory

Name of the GCS bucket where the data is going to be converted.

gcs_working_directory

type: string

mandatory

Path in the GCS bucket where the input files will be placed, and the output files generated, e.g. "some/sub/dir".

gcp_credentials_secret

type: dict

mandatory

Encrypted credentials needed to read and write data in the GCS bucket.

Information about the input file to process and the output files generated.

Parameter
Description

filename_templates

type: array

mandatory

Array containing one or several filename_template parameters (see below).

filename_template

type: string

mandatory

When a file with a name matching the template set here is added to the specified GCS folder, the conversion will be launched automatically.

The following placeholders are currently supported:

  • "FD_DATE" looks for an 8-digit date (e.g. "20191015").

  • "FD_TIME" looks for a 6-digit time (e.g. "124213").

  • "FD_BLOB_XYZ", where XYZ is a non-zero positive integer, looks for a string of characters of XYZ length.

Example 1

This template:

"stores_{{FD_DATE}}{{FD_TIME}}.txt"

will allow you to process this type of files:

"stores_20201116_124213.txt"

Example 2

This template:

"{{FD_DATE}}{{FD_BLOB_5}}fixedvalue{{FD_BLOB_11}}.gz"

will allow you to process this type of files:

"20201116_12397_fixedvalue_12312378934.gz"

file_description

type: string

mandatory

A short description of the file template entry.

xsd_schema_file

type: string

mandatory

Name of the XSD file that will be used to validate the XML file before the conversion.

In the current version, only one XSD can be used per XML entry. The XSD file name must be identical to the corresponding XML file name, excluding suffixes. For example "coupon.xsd" can be used to validate "coupon_20210404.xml".

output_suffix_filters

type: array of string

optional

Names of the output files to be kept after the conversion.

If the XML file contains many child entities, the conversion will create a lot of CSV files (one for each entity). This filter allows you to prevent unnecessary file upload to the output bucket.

It works by finding an occurrence of the string in the filename. For example, if a file named "coupon_20210404_advantage.tsv" is generated and the filter "advantage.tsv" was added, then this file will be kept.

Global parameters

Working folder parameters

You should have generated credentials when . To learn how to encrypt them, refer to .

Conversion parameters

🌐
💼
💱
👁️‍🗨️
Global parameters
Working folder parameters
Conversion parameters
setting up GCP
this page